NlpTools API
Class

NlpTools\Clustering\MergeStrategies\HeapLinkage

abstract class HeapLinkage implements MergeStrategyInterface

HeapLinkage is an abstract merge strategy.

It creates a pairwise distance matrix. It then uses a heap to compute
efficiently the minimum in the distance matrix. Then the two clusters
with the minimum distance are merged and the distance of the new cluster
with every other cluster i is recomputed. This recomputation is delegated
to the children classes through the abstract function newDistance().

The class uses an SplFixedArray that is filled with the lower triangle of
the distance matrix. This is done to save memory. The index of a pair x,y
is computed as follows:
1. if x>y swap x,y
2. index = y*(y-1)/2 + x

Methods

initializeStrategy(DistanceInterface $d, array $docs)

Initialize the distance matrix and any other data structure needed to calculate the merges later.

array getNextMerge()

Return the pair of clusters x,y to be merged.

Details

at line 44
public initializeStrategy(DistanceInterface $d, array $docs)

Initialize the distance matrix and any other data structure needed to calculate the merges later.

Parameters

DistanceInterface $d The distance metric used to calculate the distance matrix
array $docs The docs to be clustered

at line 78
public array getNextMerge()

Return the pair of clusters x,y to be merged.

1. Extract the pair with the smallest distance
2. Recalculate the distance of the merged cluster with every other cluster
3. Merge the clusters (by labeling one as removed)
4. Reheap

Return Value

array The pair (x,y) to be merged