NlpTools API

Classes

NlpTools\Analysis\FreqDist Extract the Frequency distribution of keywords
NlpTools\Analysis\Idf Idf implements the inverse document frequency measure.
NlpTools\Classifiers\ClassifierInterface
NlpTools\Classifiers\FeatureBasedLinearClassifier Classify using a linear model.
NlpTools\Classifiers\MultinomialNBClassifier Use a multinomia NB model to classify a document
NlpTools\Clustering\CentroidFactories\CentroidFactoryInterface
NlpTools\Clustering\CentroidFactories\Euclidean Computes the euclidean centroid of the provided sparse vectors
NlpTools\Clustering\CentroidFactories\Hamming This class computes the centroid of the hamming distance between two stringsthat are the binary representations of two integers (the strings are supposedto only contain the characters 1 and 0).
NlpTools\Clustering\CentroidFactories\MeanAngle MeanAngle computes the unit vector with angle the average of all the given vectors.
NlpTools\Clustering\Clusterer
NlpTools\Clustering\Hierarchical This class implements hierarchical agglomerative clustering.
NlpTools\Clustering\KMeans This clusterer uses the KMeans algorithm for clustering documents.
NlpTools\Clustering\MergeStrategies\CompleteLink In single linkage clustering the new distance of the merged cluster with cluster i is the maximum distance of either cluster x to i or y to i.
NlpTools\Clustering\MergeStrategies\GroupAverage In single linkage clustering the new distance of the merged cluster with cluster i is the average distance of all points in cluster x to i and y to i.
NlpTools\Clustering\MergeStrategies\HeapLinkage HeapLinkage is an abstract merge strategy.
NlpTools\Clustering\MergeStrategies\MergeStrategyInterface In hierarchical agglomerative clustering each document starts in its own cluster and then it is subsequently merged with the "closest" cluster.
NlpTools\Clustering\MergeStrategies\SingleLink In single linkage clustering the new distance of the merged cluster with cluster i is the smallest distance of either cluster x to i or y to i.
NlpTools\Documents\DocumentInterface A Document is a representation of a Document to be classified.
NlpTools\Documents\RawDocument RawDocument simply encapsulates a php variable
NlpTools\Documents\TokensDocument Represents a bag of words (tokens) document.
NlpTools\Documents\TrainingDocument A TrainingDocument is a document that "decorates" any other document to add the real class of the document.
NlpTools\Documents\TrainingSet A collection of TrainingDocument objects.
NlpTools\Documents\WordDocument A Document that represents a single word but with a context of a larger document.
NlpTools\Exceptions\InvalidExpression Used by the tokenization, primarily
NlpTools\FeatureFactories\DataAsFeatures
NlpTools\FeatureFactories\FeatureFactoryInterface
NlpTools\FeatureFactories\FunctionFeatures An implementation of FeatureFactoryInterface that takes any number of callables (function names, closures, array($object,'func_name'), etc.) and calls them consecutively using the return value as a feature's unique string.
NlpTools\Models\FeatureBasedNB Implement a MultinomialNBModel by training on a TrainingSet with a FeatureFactoryInterface and additive smoothing.
NlpTools\Models\Lda Topic discovery with latent dirchlet allocation using gibbs sampling.
NlpTools\Models\LinearModel This class represents a linear model of the following form f(x_vec) = l1*x1 + l2*x2 + l3*x3 ...
NlpTools\Models\Maxent Maxent is a model that assigns a weight for each feature such that all the weights maximize the Conditional Log Likelihood of the training data.
NlpTools\Models\MultinomialNBModelInterface Interface that describes a NB model.
NlpTools\Optimizers\ExternalMaxentOptimizer This class enables the use of a program written in a different language to optimize our model and return the weights for use in php.
NlpTools\Optimizers\FeatureBasedLinearOptimizerInterface
NlpTools\Optimizers\GradientDescentOptimizer Implements gradient descent with fixed step.
NlpTools\Optimizers\MaxentGradientDescent Implement a gradient descent algorithm that maximizes the conditional log likelihood of the training data.
NlpTools\Optimizers\MaxentOptimizerInterface Marker interface to use with the Maxent model for type checking
NlpTools\Random\Distributions\AbstractDistribution
NlpTools\Random\Distributions\Dirichlet Implement a k-dimensional Dirichlet distribution using draws from k gamma distributions and then normalizing.
NlpTools\Random\Distributions\Gamma Implement the gamma distribution.
NlpTools\Random\Distributions\Normal
NlpTools\Random\Generators\FromFile Return floats from a file.
NlpTools\Random\Generators\GeneratorInterface An interface for pseudo-random number generators.
NlpTools\Random\Generators\MersenneTwister A simple wrapper over the built in mt_rand() method
NlpTools\Similarity\CosineSimilarity Given two vectors compute cos(theta) where theta is the angle between the two vectors in a N-dimensional vector space.
NlpTools\Similarity\DistanceInterface Distance should return a number proportional to how dissimilar the two instances are(with any metric)
NlpTools\Similarity\Euclidean This class computes the very simple euclidean distance between two vectors ( sqrt(sum((a_i-b_i)^2)) ).
NlpTools\Similarity\HammingDistance This class implements the hamming distance of two strings or sets.
NlpTools\Similarity\JaccardIndex http://en.wikipedia.org/wiki/Jaccard_index
NlpTools\Similarity\Simhash Simhash is an implementation of the locality sensitive hash function families proposed by Moses Charikar using the Earth Mover's Distance http://www.cs.princeton.edu/courses/archive/spring04/cos598B/bib/CharikarEstim.pdf
NlpTools\Similarity\SimilarityInterface Similarity should return a number that is proportional to how similar those two instances are (with any metric).
NlpTools\Stemmers\GreekStemmer This stemmer is an implementation of the stemmer described by G.
NlpTools\Stemmers\LancasterStemmer A word stemmer based on the Lancaster stemming algorithm.
NlpTools\Stemmers\PorterStemmer Copyright 2013 Katharopoulos Angelos <katharas@gmail.com>
NlpTools\Stemmers\RegexStemmer This stemmer removes affixes according to a regular expression.
NlpTools\Stemmers\Stemmer http://en.wikipedia.org/wiki/Stemming
NlpTools\Tokenizers\ClassifierBasedTokenizer A tokenizer that uses a classifier (of any type) to determine if there is an "end of word" (EOW).
NlpTools\Tokenizers\PennTreeBankTokenizer PennTreeBank Tokenizer Based on http://www.cis.upenn.edu/~treebank/tokenizer.sed
NlpTools\Tokenizers\RegexTokenizer Regex tokenizer tokenizes text based on a set of regexes
NlpTools\Tokenizers\TokenizerInterface
NlpTools\Tokenizers\WhitespaceAndPunctuationTokenizer Simple white space tokenizer.
NlpTools\Tokenizers\WhitespaceTokenizer Simple white space tokenizer.
NlpTools\Utils\ClassifierBasedTransformation Classify whatever is passed in the transform and pass it a different set of transformations based on the class.
NlpTools\Utils\EnglishVowels Helper Vowel class, determines if the character at a given index is a vowel
NlpTools\Utils\Normalizers\English For English we simply transform to lower case using mb_strtolower.
NlpTools\Utils\Normalizers\Greek To normalize greek text we use mb_strtolower to transform to lower case and then replace every accented character with its non-accented counter part and the final ς with σ
NlpTools\Utils\Normalizers\Normalizer The Normalizer's purpose is to transform any word from any one of the possible writings to a single writing consistently.
NlpTools\Utils\StopWords Stop Words are words which are filtered out because they carry little to no information.
NlpTools\Utils\TransformationInterface TransformationInterface represents any type of transformation to be applied upon documents.
NlpTools\Utils\VowelsAbstractFactory Factory wrapper for Vowels