NlpTools API
Class

NlpTools\Clustering\KMeans

class KMeans extends Clusterer

This clusterer uses the KMeans algorithm for clustering documents.

It accepts as parameters the number of clusters and the distance metric
as well as the methodology for computing the new centroids (thus it
can be used to cluster documents in spaces other than the euclidean
vector space).
A description of this algorithm can be found at
http://en.wikipedia.org/wiki/K-means_clustering

Methods

array cluster(TrainingSet $documents, FeatureFactoryInterface $ff)

Apply the feature factory to the documents and then cluster the resulting array using the provided distance metric and centroid factory.

__construct(int $n, DistanceInterface $d, CentroidFactoryInterface $cf, float $cutoff = 1.0E-5)

Initialize the K Means clusterer

Details

at line 46
public array cluster(TrainingSet $documents, FeatureFactoryInterface $ff)

Apply the feature factory to the documents and then cluster the resulting array using the provided distance metric and centroid factory.

Parameters

TrainingSet $documents The documents to be clustered
FeatureFactoryInterface $ff A feature factory to transform the documents given

Return Value

array The clusters, an array containing arrays of offsets for the documents

at line 34
public __construct(int $n, DistanceInterface $d, CentroidFactoryInterface $cf, float $cutoff = 1.0E-5)

Initialize the K Means clusterer

Parameters

int $n The number of clusters to compute
DistanceInterface $d The distance metric to be used (Euclidean, Hamming, ...)
CentroidFactoryInterface $cf This parameter will be used to create the new centroids from a set of documents
float $cutoff When the maximum change of the centroids is smaller than that stop iterating