NlpTools API
Class

NlpTools\Clustering\MergeStrategies\SingleLink

class SingleLink extends HeapLinkage

In single linkage clustering the new distance of the merged cluster with cluster i is the smallest distance of either cluster x to i or y to i.

Example:

Suppose we have the following four clusters

a = [ (0,0) ]
b = [ (5,2) ]
c = [ (0,5) ]
d = [ (0,2) ]

with the following pairwise distance matrix

a b c d
+-----+-----+-----+-----+
a | 0 | 5.3 | 5 | 2 |
+-----+-----+-----+-----+
b | 5.3 | 0 | 5.8 | 5 |
+-----+-----+-----+-----+
c | 5 | 5.8 | 0 | 3 |
+-----+-----+-----+-----+
d | 2 | 5 | 3 | 0 |
+-----+-----+-----+-----+

if we merge clusters a,d (which are the closest) then we need to update the
matrix to represent the new distances. For every other cluster (b and c) the
new distance has to be calculated and it is going to be the minimum between
the distances of the two clusters to be merged.

a,d b c
+-------------+-------------+-------------+
a,d | 0 | min(5.3, 2) | min(5, 3) |
+-------------+-------------+-------------+
b | min(5.3, 2) | 0 | 5.8 |
+-------------+-------------+-------------+
c | min(5, 3) | 5.8 | 0 |
+-------------+-------------+-------------+

Methods

initializeStrategy(DistanceInterface $d, array $docs)

Initialize the distance matrix and any other data structure needed to calculate the merges later.

from HeapLinkage
array getNextMerge()

Return the pair of clusters x,y to be merged.

from HeapLinkage

Details

in HeapLinkage at line 44
public initializeStrategy(DistanceInterface $d, array $docs)

Initialize the distance matrix and any other data structure needed to calculate the merges later.

Parameters

DistanceInterface $d The distance metric used to calculate the distance matrix
array $docs The docs to be clustered

in HeapLinkage at line 78
public array getNextMerge()

Return the pair of clusters x,y to be merged.

1. Extract the pair with the smallest distance
2. Recalculate the distance of the merged cluster with every other cluster
3. Merge the clusters (by labeling one as removed)
4. Reheap

Return Value

array The pair (x,y) to be merged