NlpTools API
Class

NlpTools\Clustering\MergeStrategies\GroupAverage

class GroupAverage extends HeapLinkage

In single linkage clustering the new distance of the merged cluster with cluster i is the average distance of all points in cluster x to i and y to i.

The average distance is efficiently computed by assuming that every point from
every other point in each cluster have the same distance (the average distance).
Then the computation is simply a weighted average of the average distances.

Methods

initializeStrategy(DistanceInterface $d, array $docs)

Initialize the distance matrix and any other data structure needed to calculate the merges later.

array getNextMerge()

Return the pair of clusters x,y to be merged.

Details

at line 19
public initializeStrategy(DistanceInterface $d, array $docs)

Initialize the distance matrix and any other data structure needed to calculate the merges later.

Parameters

DistanceInterface $d The distance metric used to calculate the distance matrix
array $docs The docs to be clustered

at line 37
public array getNextMerge()

Return the pair of clusters x,y to be merged.

1. Extract the pair with the smallest distance
2. Recalculate the distance of the merged cluster with every other cluster
3. Merge the clusters (by labeling one as removed)
4. Reheap

Return Value

array The pair (x,y) to be merged