NlpTools API
Index
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

A

DocumentInterface::applyTransformation() — Method in class DocumentInterface
Apply the transformation to the data of this document.
RawDocument::applyTransformation() — Method in class RawDocument
Apply the transformation to the data of this document.
TokensDocument::applyTransformation() — Method in class TokensDocument
Apply the transformation to the data of this document.
TrainingDocument::applyTransformation() — Method in class TrainingDocument
Pass the transformation to the decorated document
TrainingSet::addDocument() — Method in class TrainingSet
Add a document to the set.
TrainingSet::applyTransformations() — Method in class TrainingSet
Apply an array of transformations to all documents in this container.
WordDocument::applyTransformation() — Method in class WordDocument
Apply the transformation to the data of this document.
FunctionFeatures::add() — Method in class FunctionFeatures
Add a function as a feature
AbstractDistributionClass in namespace NlpTools\Random\Distributions

C

ClassifierInterfaceClass in namespace NlpTools\Classifiers
ClassifierInterface::classify() — Method in class ClassifierInterface
Decide in which class C member of $classes would $d fit best.
FeatureBasedLinearClassifier::classify() — Method in class FeatureBasedLinearClassifier
Compute the vote for every class.
MultinomialNBClassifier::classify() — Method in class MultinomialNBClassifier
Compute the probability of $d belonging to each class successively and return that class that has the maximum probability.
CentroidFactoryInterfaceClass in namespace NlpTools\Clustering\CentroidFactories
ClustererClass in namespace NlpTools\Clustering
Clusterer::cluster() — Method in class Clusterer
Group the documents together
Hierarchical::cluster() — Method in class Hierarchical
Group the documents together
KMeans::cluster() — Method in class KMeans
Apply the feature factory to the documents and then cluster the resulting array using the provided distance metric and centroid factory.
CompleteLinkClass in namespace NlpTools\Clustering\MergeStrategies
In single linkage clustering the new distance of the merged cluster with cluster i is the maximum distance of either cluster x to i or y to i.
TrainingSet::current() — Method in class TrainingSet
TrainingSet::count() — Method in class TrainingSet
Maxent::CLogLik() — Method in class Maxent
Not implemented yet.
CosineSimilarityClass in namespace NlpTools\Similarity
Given two vectors compute cos(theta) where theta is the angle between the two vectors in a N-dimensional vector space.
ClassifierBasedTokenizerClass in namespace NlpTools\Tokenizers
A tokenizer that uses a classifier (of any type) to determine if there is an "end of word" (EOW).
ClassifierBasedTransformationClass in namespace NlpTools\Utils
Classify whatever is passed in the transform and pass it a different set of transformations based on the class.

D

Hierarchical::dendrogramToClusters() — Method in class Hierarchical
Flatten a dendrogram to an almost specific number of clusters (the closest power of 2 larger than $NC)
DocumentInterfaceClass in namespace NlpTools\Documents
A Document is a representation of a Document to be classified.
DataAsFeaturesClass in namespace NlpTools\FeatureFactories
Maxent::dumpWeights() — Method in class Maxent
Simply print_r weights.
DirichletClass in namespace NlpTools\Random\Distributions
Implement a k-dimensional Dirichlet distribution using draws from k gamma distributions and then normalizing.
CosineSimilarity::dist() — Method in class CosineSimilarity
DistanceInterfaceClass in namespace NlpTools\Similarity
Distance should return a number proportional to how dissimilar the two instances are(with any metric)
DistanceInterface::dist() — Method in class DistanceInterface
Euclidean::dist() — Method in class Euclidean
HammingDistance::dist() — Method in class HammingDistance
JaccardIndex::dist() — Method in class JaccardIndex
Simhash::dist() — Method in class Simhash

E

EuclideanClass in namespace NlpTools\Clustering\CentroidFactories
Computes the euclidean centroid of the provided sparse vectors
ExternalMaxentOptimizerClass in namespace NlpTools\Optimizers
This class enables the use of a program written in a different language to optimize our model and return the weights for use in php.
EuclideanClass in namespace NlpTools\Similarity
This class computes the very simple euclidean distance between two vectors ( sqrt(sum((a_i-b_i)^2)) ).
EnglishVowelsClass in namespace NlpTools\Utils
Helper Vowel class, determines if the character at a given index is a vowel
EnglishClass in namespace NlpTools\Utils\Normalizers
For English we simply transform to lower case using mb_strtolower.

F

FreqDistClass in namespace NlpTools\Analysis
Extract the Frequency distribution of keywords
FeatureBasedLinearClassifierClass in namespace NlpTools\Classifiers
Classify using a linear model.
FeatureFactoryInterfaceClass in namespace NlpTools\FeatureFactories
FunctionFeaturesClass in namespace NlpTools\FeatureFactories
An implementation of FeatureFactoryInterface that takes any number of callables (function names, closures, array($object,'func_name'), etc.) and calls them consecutively using the return value as a feature's unique string.
FeatureBasedNBClass in namespace NlpTools\Models
Implement a MultinomialNBModel by training on a TrainingSet with a FeatureFactoryInterface and additive smoothing.
FeatureBasedLinearOptimizerInterfaceClass in namespace NlpTools\Optimizers
FromFileClass in namespace NlpTools\Random\Generators
Return floats from a file.
Normalizer::factory() — Method in class Normalizer
Just instantiate the normalizer using a factory method.
VowelsAbstractFactory::factory() — Method in class VowelsAbstractFactory
Return the correct language vowel checker

G

FreqDist::getTotalTokens() — Method in class FreqDist
Get the total number of tokens in this tokensDocument
FreqDist::getWeightPerToken() — Method in class FreqDist
Return the weight of a single token
FreqDist::getTotalUniqueTokens() — Method in class FreqDist
Return get the total number of unique tokens
FreqDist::getKeys() — Method in class FreqDist
Return the sorted keys by frequency desc
FreqDist::getValues() — Method in class FreqDist
Return the sorted values by frequency desc
FreqDist::getKeyValues() — Method in class FreqDist
Return the full key value store
FreqDist::getHapaxes() — Method in class FreqDist
Returns an array of tokens that occurred once
FeatureBasedLinearClassifier::getVote() — Method in class FeatureBasedLinearClassifier
Compute the features that fire for the Document $d.
MultinomialNBClassifier::getScore() — Method in class MultinomialNBClassifier
Compute the log of the probability of the Document $d belonging to class $class.
CentroidFactoryInterface::getCentroid() — Method in class CentroidFactoryInterface
Parse the provided docs and create a doc that given a metric of distance is the centroid of the provided docs.
Euclidean::getCentroid() — Method in class Euclidean
Parse the provided docs and create a doc that given a metric of distance is the centroid of the provided docs.
Hamming::getCentroid() — Method in class Hamming
Parse the provided docs and create a doc that given a metric of distance is the centroid of the provided docs.
MeanAngle::getCentroid() — Method in class MeanAngle
Parse the provided docs and create a doc that given a metric of distance is the centroid of the provided docs.
GroupAverageClass in namespace NlpTools\Clustering\MergeStrategies
In single linkage clustering the new distance of the merged cluster with cluster i is the average distance of all points in cluster x to i and y to i.
GroupAverage::getNextMerge() — Method in class GroupAverage
Return the pair of clusters x,y to be merged.
HeapLinkage::getNextMerge() — Method in class HeapLinkage
Return the pair of clusters x,y to be merged.
MergeStrategyInterface::getNextMerge() — Method in class MergeStrategyInterface
Return the next two clusters for merging and assume they are merged (ex.
DocumentInterface::getDocumentData() — Method in class DocumentInterface
Return the data of what is being represented.
RawDocument::getDocumentData() — Method in class RawDocument
Return the data of what is being represented.
TokensDocument::getDocumentData() — Method in class TokensDocument
Return the data of what is being represented.
TrainingDocument::getDocumentData() — Method in class TrainingDocument
Return the data of what is being represented.
TrainingDocument::getClass() — Method in class TrainingDocument
TrainingSet::getClassSet() — Method in class TrainingSet
WordDocument::getDocumentData() — Method in class WordDocument
Return the data of what is being represented.
DataAsFeatures::getFeatureArray() — Method in class DataAsFeatures
For use with TokensDocument mostly.
FeatureFactoryInterface::getFeatureArray() — Method in class FeatureFactoryInterface
Return an array with unique strings that are the features that "fire" for the specified Document $d and class $class
FunctionFeatures::getFeatureArray() — Method in class FunctionFeatures
Compute the features that "fire" for a given class,document pair.
FeatureBasedNB::getPrior() — Method in class FeatureBasedNB
Return the prior probability of class $class P(c) as computed by the training data
FeatureBasedNB::getCondProb() — Method in class FeatureBasedNB
Return the conditional probability of a term for a given class.
Lda::generateDocs() — Method in class Lda
Generate an array suitable for use with Lda::initialize and Lda::gibbsSample from a training set.
Lda::gibbsSample() — Method in class Lda
Generate one gibbs sample.
Lda::getWordsPerTopicsProbabilities() — Method in class Lda
Get the probability of a word given a topic (phi according to Griffiths and Steyvers)
Lda::getPhi() — Method in class Lda
Shortcut to getWordsPerTopicsProbabilities
Lda::getDocumentsPerTopicsProbabilities() — Method in class Lda
Get the probability of a document given a topic (theta according to Griffiths and Steyvers)
Lda::getTheta() — Method in class Lda
Shortcut to getDocumentsPerTopicsProbabilities
Lda::getLogLikelihood() — Method in class Lda
Log likelihood of the model having generated the data as implemented by M.
LinearModel::getWeight() — Method in class LinearModel
Get the weight for a given feature
LinearModel::getWeights() — Method in class LinearModel
Get all the weights as an array.
MultinomialNBModelInterface::getPrior() — Method in class MultinomialNBModelInterface
MultinomialNBModelInterface::getCondProb() — Method in class MultinomialNBModelInterface
GradientDescentOptimizerClass in namespace NlpTools\Optimizers
Implements gradient descent with fixed step.
GammaClass in namespace NlpTools\Random\Distributions
Implement the gamma distribution.
FromFile::generate() — Method in class FromFile
Generates a pseudo-random number with uniform distribution in the interval [0,1)
GeneratorInterfaceClass in namespace NlpTools\Random\Generators
An interface for pseudo-random number generators.
GeneratorInterface::generate() — Method in class GeneratorInterface
Generates a pseudo-random number with uniform distribution in the interval [0,1)
MersenneTwister::generate() — Method in class MersenneTwister
Generates a pseudo-random number with uniform distribution in the interval [0,1)
MersenneTwister::get() — Method in class MersenneTwister
GreekStemmerClass in namespace NlpTools\Stemmers
This stemmer is an implementation of the stemmer described by G.
LancasterStemmer::getDefaultRuleSet() — Method in class LancasterStemmer
Contains an array with the default lancaster rules
GreekClass in namespace NlpTools\Utils\Normalizers
To normalize greek text we use mb_strtolower to transform to lower case and then replace every accented character with its non-accented counter part and the final ς with σ

H

HammingClass in namespace NlpTools\Clustering\CentroidFactories
This class computes the centroid of the hamming distance between two stringsthat are the binary representations of two integers (the strings are supposedto only contain the characters 1 and 0).
HierarchicalClass in namespace NlpTools\Clustering
This class implements hierarchical agglomerative clustering.
HeapLinkageClass in namespace NlpTools\Clustering\MergeStrategies
HeapLinkage is an abstract merge strategy.
HammingDistanceClass in namespace NlpTools\Similarity
This class implements the hamming distance of two strings or sets.

I

IdfClass in namespace NlpTools\Analysis
Idf implements the inverse document frequency measure.
GroupAverage::initializeStrategy() — Method in class GroupAverage
Initialize the distance matrix and any other data structure needed to calculate the merges later.
HeapLinkage::initializeStrategy() — Method in class HeapLinkage
Initialize the distance matrix and any other data structure needed to calculate the merges later.
MergeStrategyInterface::initializeStrategy() — Method in class MergeStrategyInterface
Study the docs and preprocess anything required for computing the merges
InvalidExpressionClass in namespace NlpTools\Exceptions
Used by the tokenization, primarily
InvalidExpression::invalidRegex() — Method in class InvalidExpression
Lda::initialize() — Method in class Lda
Count initially the co-occurences of documents,topics and topics,words and cache them to run Gibbs sampling faster
EnglishVowels::isVowel() — Method in class EnglishVowels
Returns true if the letter at the given index is a vowel, works with y
VowelsAbstractFactory::isVowel() — Method in class VowelsAbstractFactory
Check if the the letter at the given index is a vowel

J

JaccardIndexClass in namespace NlpTools\Similarity
http://en.wikipedia.org/wiki/Jaccard_index

K

KMeansClass in namespace NlpTools\Clustering
This clusterer uses the KMeans algorithm for clustering documents.
TrainingSet::key() — Method in class TrainingSet

L

LdaClass in namespace NlpTools\Models
Topic discovery with latent dirchlet allocation using gibbs sampling.
LinearModelClass in namespace NlpTools\Models
This class represents a linear model of the following form f(x_vec) = l1*x1 + l2*x2 + l3*x3 ...
LancasterStemmerClass in namespace NlpTools\Stemmers
A word stemmer based on the Lancaster stemming algorithm.

M

MultinomialNBClassifierClass in namespace NlpTools\Classifiers
Use a multinomia NB model to classify a document
MeanAngleClass in namespace NlpTools\Clustering\CentroidFactories
MeanAngle computes the unit vector with angle the average of all the given vectors.
MergeStrategyInterfaceClass in namespace NlpTools\Clustering\MergeStrategies
In hierarchical agglomerative clustering each document starts in its own cluster and then it is subsequently merged with the "closest" cluster.
FunctionFeatures::modelFrequency() — Method in class FunctionFeatures
Set the feature factory to model frequency instead of presence
FunctionFeatures::modelPresence() — Method in class FunctionFeatures
Set the feature factory to model presence instead of frequency
MaxentClass in namespace NlpTools\Models
Maxent is a model that assigns a weight for each feature such that all the weights maximize the Conditional Log Likelihood of the training data.
MultinomialNBModelInterfaceClass in namespace NlpTools\Models
Interface that describes a NB model.
MaxentGradientDescentClass in namespace NlpTools\Optimizers
Implement a gradient descent algorithm that maximizes the conditional log likelihood of the training data.
MaxentOptimizerInterfaceClass in namespace NlpTools\Optimizers
Marker interface to use with the Maxent model for type checking
MersenneTwisterClass in namespace NlpTools\Random\Generators
A simple wrapper over the built in mt_rand() method

N

TrainingSet::next() — Method in class TrainingSet
NormalClass in namespace NlpTools\Random\Distributions
English::normalize() — Method in class English
Transform the word according to the class description
Greek::normalize() — Method in class Greek
Transform the word according to the class description
NormalizerClass in namespace NlpTools\Utils\Normalizers
The Normalizer's purpose is to transform any word from any one of the possible writings to a single writing consistently.
Normalizer::normalize() — Method in class Normalizer
Transform the word according to the class description
Normalizer::normalizeAll() — Method in class Normalizer
Apply the normalize function to all the items in the array

O

Idf::offsetGet() — Method in class Idf
Implements the array access interface.
Idf::offsetExists() — Method in class Idf
Implements the array access interface.
Idf::offsetSet() — Method in class Idf
Will not be implemented.
Idf::offsetUnset() — Method in class Idf
Will not be implemented.
TrainingSet::offsetSet() — Method in class TrainingSet
TrainingSet::offsetUnset() — Method in class TrainingSet
TrainingSet::offsetGet() — Method in class TrainingSet
TrainingSet::offsetExists() — Method in class TrainingSet
ExternalMaxentOptimizer::optimize() — Method in class ExternalMaxentOptimizer
This function receives an array that contains an array for each document which contains an array of feature identifiers for each class and at the special key '__label__' the actual class of the training document.
FeatureBasedLinearOptimizerInterface::optimize() — Method in class FeatureBasedLinearOptimizerInterface
This function receives an array that contains an array for each document which contains an array of feature identifiers for each class and at the special key '__label__' the actual class of the training document.
GradientDescentOptimizer::optimize() — Method in class GradientDescentOptimizer
This function receives an array that contains an array for each document which contains an array of feature identifiers for each class and at the special key '__label__' the actual class of the training document.

P

Maxent::P() — Method in class Maxent
Calculate the probability that document $d belongs to the class $class given a set of possible classes, a feature factory and the model's weights l[i]
PorterStemmerClass in namespace NlpTools\Stemmers
Copyright 2013 Katharopoulos Angelos <katharas@gmail.com>
PennTreeBankTokenizerClass in namespace NlpTools\Tokenizers
PennTreeBank Tokenizer Based on http://www.cis.upenn.edu/~treebank/tokenizer.sed

R

RawDocumentClass in namespace NlpTools\Documents
RawDocument simply encapsulates a php variable
TrainingSet::rewind() — Method in class TrainingSet
GradientDescentOptimizer::reportProgress() — Method in class GradientDescentOptimizer
RegexStemmerClass in namespace NlpTools\Stemmers
This stemmer removes affixes according to a regular expression.
RegexTokenizerClass in namespace NlpTools\Tokenizers
Regex tokenizer tokenizes text based on a set of regexes
ClassifierBasedTransformation::register() — Method in class ClassifierBasedTransformation
Register a set of transformations for a given class.

S

SingleLinkClass in namespace NlpTools\Clustering\MergeStrategies
In single linkage clustering the new distance of the merged cluster with cluster i is the smallest distance of either cluster x to i or y to i.
TrainingSet::setAsKey() — Method in class TrainingSet
Decide what should be returned as key when iterated upon
AbstractDistribution::sample() — Method in class AbstractDistribution
Dirichlet::sample() — Method in class Dirichlet
Gamma::sample() — Method in class Gamma
Normal::sample() — Method in class Normal
CosineSimilarity::similarity() — Method in class CosineSimilarity
JaccardIndex::similarity() — Method in class JaccardIndex
SimhashClass in namespace NlpTools\Similarity
Simhash is an implementation of the locality sensitive hash function families proposed by Moses Charikar using the Earth Mover's Distance http://www.cs.princeton.edu/courses/archive/spring04/cos598B/bib/CharikarEstim.pdf
Simhash::simhash() — Method in class Simhash
Compute the locality sensitive hash for this set.
Simhash::similarity() — Method in class Simhash
SimilarityInterfaceClass in namespace NlpTools\Similarity
Similarity should return a number that is proportional to how similar those two instances are (with any metric).
SimilarityInterface::similarity() — Method in class SimilarityInterface
GreekStemmer::stem() — Method in class GreekStemmer
Remove the suffix from $word
LancasterStemmer::stem() — Method in class LancasterStemmer
Performs a Lancaster stem on the giving word
PorterStemmer::stem() — Method in class PorterStemmer
Remove the suffix from $word
RegexStemmer::stem() — Method in class RegexStemmer
Remove the suffix from $word
StemmerClass in namespace NlpTools\Stemmers
http://en.wikipedia.org/wiki/Stemming
Stemmer::stem() — Method in class Stemmer
Remove the suffix from $word
Stemmer::stemAll() — Method in class Stemmer
Apply the stemmer to every single token.
StopWordsClass in namespace NlpTools\Utils
Stop Words are words which are filtered out because they carry little to no information.

T

TokensDocumentClass in namespace NlpTools\Documents
Represents a bag of words (tokens) document.
TrainingDocumentClass in namespace NlpTools\Documents
A TrainingDocument is a document that "decorates" any other document to add the real class of the document.
TrainingSetClass in namespace NlpTools\Documents
A collection of TrainingDocument objects.
FeatureBasedNB::train_with_context() — Method in class FeatureBasedNB
Train on the given set and fill the model's variables.
FeatureBasedNB::train() — Method in class FeatureBasedNB
Train on the given set and fill the models variables
Lda::train() — Method in class Lda
Run the gibbs sampler $it times.
Maxent::train() — Method in class Maxent
Calculate all the features for every possible class.
Stemmer::transform() — Method in class Stemmer
Return the value transformed.
ClassifierBasedTokenizer::tokenize() — Method in class ClassifierBasedTokenizer
Break a character sequence to a token sequence
PennTreeBankTokenizer::tokenize() — Method in class PennTreeBankTokenizer
Calls internal functions to handle data processing
RegexTokenizer::tokenize() — Method in class RegexTokenizer
Break a character sequence to a token sequence
TokenizerInterfaceClass in namespace NlpTools\Tokenizers
TokenizerInterface::tokenize() — Method in class TokenizerInterface
Break a character sequence to a token sequence
WhitespaceAndPunctuationTokenizer::tokenize() — Method in class WhitespaceAndPunctuationTokenizer
Break a character sequence to a token sequence
WhitespaceTokenizer::tokenize() — Method in class WhitespaceTokenizer
Break a character sequence to a token sequence
ClassifierBasedTransformation::transform() — Method in class ClassifierBasedTransformation
Classify the passed in variable w and then apply each transformation to the output of the previous one.
Normalizer::transform() — Method in class Normalizer
Return the value transformed.
StopWords::transform() — Method in class StopWords
Return the value transformed.
TransformationInterfaceClass in namespace NlpTools\Utils
TransformationInterface represents any type of transformation to be applied upon documents.
TransformationInterface::transform() — Method in class TransformationInterface
Return the value transformed.

V

TrainingSet::valid() — Method in class TrainingSet
VowelsAbstractFactoryClass in namespace NlpTools\Utils
Factory wrapper for Vowels

W

WordDocumentClass in namespace NlpTools\Documents
A Document that represents a single word but with a context of a larger document.
WhitespaceAndPunctuationTokenizerClass in namespace NlpTools\Tokenizers
Simple white space tokenizer.
WhitespaceTokenizerClass in namespace NlpTools\Tokenizers
Simple white space tokenizer.

_

FreqDist::__construct() — Method in class FreqDist
This sorts the token meta data collection right away so use frequency distribution data can be extracted.
Idf::__construct() — Method in class Idf
FeatureBasedLinearClassifier::__construct() — Method in class FeatureBasedLinearClassifier
MultinomialNBClassifier::__construct() — Method in class MultinomialNBClassifier
Hierarchical::__construct() — Method in class Hierarchical
KMeans::__construct() — Method in class KMeans
Initialize the K Means clusterer
RawDocument::__construct() — Method in class RawDocument
TokensDocument::__construct() — Method in class TokensDocument
TrainingDocument::__construct() — Method in class TrainingDocument
TrainingSet::__construct() — Method in class TrainingSet
WordDocument::__construct() — Method in class WordDocument
FunctionFeatures::__construct() — Method in class FunctionFeatures
FeatureBasedNB::__construct() — Method in class FeatureBasedNB
FeatureBasedNB::__sleep() — Method in class FeatureBasedNB
Just save the probabilities for reuse
Lda::__construct() — Method in class Lda
LinearModel::__construct() — Method in class LinearModel
ExternalMaxentOptimizer::__construct() — Method in class ExternalMaxentOptimizer
GradientDescentOptimizer::__construct() — Method in class GradientDescentOptimizer
AbstractDistribution::__construct() — Method in class AbstractDistribution
Dirichlet::__construct() — Method in class Dirichlet
Gamma::__construct() — Method in class Gamma
Normal::__construct() — Method in class Normal
FromFile::__construct() — Method in class FromFile
Construct a FromFile generator
Simhash::__construct() — Method in class Simhash
LancasterStemmer::__construct() — Method in class LancasterStemmer
Constructor loads the ruleset into memory
RegexStemmer::__construct() — Method in class RegexStemmer
ClassifierBasedTokenizer::__construct() — Method in class ClassifierBasedTokenizer
PennTreeBankTokenizer::__construct() — Method in class PennTreeBankTokenizer
RegexTokenizer::__construct() — Method in class RegexTokenizer
Initialize the Tokenizer
ClassifierBasedTransformation::__construct() — Method in class ClassifierBasedTransformation
In order to classify anything with NlpTools we need something that implements the ClassifierInterface.
StopWords::__construct() — Method in class StopWords