NlpTools\Clustering\KMeans

class KMeans extends Clusterer

This clusterer uses the KMeans algorithm for clustering documents.

It accepts as parameters the number of clusters and the distance metric
as well as the methodology for computing the new centroids (thus it
can be used to cluster documents in spaces other than the euclidean
vector space).
A description of this algorithm can be found at
http://en.wikipedia.org/wiki/K-means_clustering

Methods

array	cluster(TrainingSet $documents, FeatureFactoryInterface $ff) Apply the feature factory to the documents and then cluster the resulting array using the provided distance metric and centroid factory.
	__construct(int $n, DistanceInterface $d, CentroidFactoryInterface $cf, float $cutoff = 1.0E-5) Initialize the K Means clusterer

Details

at line 46
`public array cluster(TrainingSet $documents, FeatureFactoryInterface $ff)`

Apply the feature factory to the documents and then cluster the resulting array using the provided distance metric and centroid factory.

Parameters

TrainingSet	$documents	The documents to be clustered
FeatureFactoryInterface	$ff	A feature factory to transform the documents given

Return Value

array

The clusters, an array containing arrays of offsets for the documents

at line 34
`public __construct(int $n, DistanceInterface $d, CentroidFactoryInterface $cf, float $cutoff = 1.0E-5)`

Initialize the K Means clusterer

Parameters

int	$n	The number of clusters to compute
DistanceInterface	$d	The distance metric to be used (Euclidean, Hamming, ...)
CentroidFactoryInterface	$cf	This parameter will be used to create the new centroids from a set of documents
float	$cutoff	When the maximum change of the centroids is smaller than that stop iterating

NlpTools\Clustering\KMeans

Methods

Details

at line 46 public array cluster(TrainingSet $documents, FeatureFactoryInterface $ff)

Parameters

Return Value

at line 34 public __construct(int $n, DistanceInterface $d, CentroidFactoryInterface $cf, float $cutoff = 1.0E-5)

Parameters

at line 46
`public array cluster(TrainingSet $documents, FeatureFactoryInterface $ff)`

at line 34
`public __construct(int $n, DistanceInterface $d, CentroidFactoryInterface $cf, float $cutoff = 1.0E-5)`