NlpTools API
Namespace

NlpTools\Similarity

CosineSimilarity Given two vectors compute cos(theta) where theta is the angle between the two vectors in a N-dimensional vector space.
Euclidean This class computes the very simple euclidean distance between two vectors ( sqrt(sum((a_i-b_i)^2)) ).
HammingDistance This class implements the hamming distance of two strings or sets.
JaccardIndex http://en.wikipedia.org/wiki/Jaccard_index
Simhash Simhash is an implementation of the locality sensitive hash function families proposed by Moses Charikar using the Earth Mover's Distance http://www.cs.princeton.edu/courses/archive/spring04/cos598B/bib/CharikarEstim.pdf

Interfaces

DistanceInterface Distance should return a number proportional to how dissimilar the two instances are(with any metric)
SimilarityInterface Similarity should return a number that is proportional to how similar those two instances are (with any metric).