Index | NlpTools API

A

DocumentInterface::applyTransformation() — Method in class DocumentInterface: Apply the transformation to the data of this document.
RawDocument::applyTransformation() — Method in class RawDocument: Apply the transformation to the data of this document.
TokensDocument::applyTransformation() — Method in class TokensDocument: Apply the transformation to the data of this document.
TrainingDocument::applyTransformation() — Method in class TrainingDocument: Pass the transformation to the decorated document
TrainingSet::addDocument() — Method in class TrainingSet: Add a document to the set.
TrainingSet::applyTransformations() — Method in class TrainingSet: Apply an array of transformations to all documents in this container.
WordDocument::applyTransformation() — Method in class WordDocument: Apply the transformation to the data of this document.
FunctionFeatures::add() — Method in class FunctionFeatures: Add a function as a feature
AbstractDistribution — Class in namespace NlpTools\Random\Distributions

C

ClassifierInterface — Class in namespace NlpTools\Classifiers
ClassifierInterface::classify() — Method in class ClassifierInterface: Decide in which class C member of $classes would $d fit best.
FeatureBasedLinearClassifier::classify() — Method in class FeatureBasedLinearClassifier: Compute the vote for every class.
MultinomialNBClassifier::classify() — Method in class MultinomialNBClassifier: Compute the probability of $d belonging to each class successively and return that class that has the maximum probability.
CentroidFactoryInterface — Class in namespace NlpTools\Clustering\CentroidFactories
Clusterer — Class in namespace NlpTools\Clustering
Clusterer::cluster() — Method in class Clusterer: Group the documents together
Hierarchical::cluster() — Method in class Hierarchical: Group the documents together
KMeans::cluster() — Method in class KMeans: Apply the feature factory to the documents and then cluster the resulting array using the provided distance metric and centroid factory.
CompleteLink — Class in namespace NlpTools\Clustering\MergeStrategies: In single linkage clustering the new distance of the merged cluster with cluster i is the maximum distance of either cluster x to i or y to i.
TrainingSet::current() — Method in class TrainingSet
TrainingSet::count() — Method in class TrainingSet
Maxent::CLogLik() — Method in class Maxent: Not implemented yet.
CosineSimilarity — Class in namespace NlpTools\Similarity: Given two vectors compute cos(theta) where theta is the angle between the two vectors in a N-dimensional vector space.
ClassifierBasedTokenizer — Class in namespace NlpTools\Tokenizers: A tokenizer that uses a classifier (of any type) to determine if there is an "end of word" (EOW).
ClassifierBasedTransformation — Class in namespace NlpTools\Utils: Classify whatever is passed in the transform and pass it a different set of transformations based on the class.

D

Hierarchical::dendrogramToClusters() — Method in class Hierarchical: Flatten a dendrogram to an almost specific number of clusters (the closest power of 2 larger than $NC)
DocumentInterface — Class in namespace NlpTools\Documents: A Document is a representation of a Document to be classified.
DataAsFeatures — Class in namespace NlpTools\FeatureFactories
Maxent::dumpWeights() — Method in class Maxent: Simply print_r weights.
Dirichlet — Class in namespace NlpTools\Random\Distributions: Implement a k-dimensional Dirichlet distribution using draws from k gamma distributions and then normalizing.
CosineSimilarity::dist() — Method in class CosineSimilarity
DistanceInterface — Class in namespace NlpTools\Similarity: Distance should return a number proportional to how dissimilar the two instances are(with any metric)
DistanceInterface::dist() — Method in class DistanceInterface
Euclidean::dist() — Method in class Euclidean
HammingDistance::dist() — Method in class HammingDistance
JaccardIndex::dist() — Method in class JaccardIndex
Simhash::dist() — Method in class Simhash

E

Euclidean — Class in namespace NlpTools\Clustering\CentroidFactories: Computes the euclidean centroid of the provided sparse vectors
ExternalMaxentOptimizer — Class in namespace NlpTools\Optimizers: This class enables the use of a program written in a different language to optimize our model and return the weights for use in php.
Euclidean — Class in namespace NlpTools\Similarity: This class computes the very simple euclidean distance between two vectors ( sqrt(sum((a_i-b_i)^2)) ).
EnglishVowels — Class in namespace NlpTools\Utils: Helper Vowel class, determines if the character at a given index is a vowel
English — Class in namespace NlpTools\Utils\Normalizers: For English we simply transform to lower case using mb_strtolower.

F

FreqDist — Class in namespace NlpTools\Analysis: Extract the Frequency distribution of keywords
FeatureBasedLinearClassifier — Class in namespace NlpTools\Classifiers: Classify using a linear model.
FeatureFactoryInterface — Class in namespace NlpTools\FeatureFactories
FunctionFeatures — Class in namespace NlpTools\FeatureFactories: An implementation of FeatureFactoryInterface that takes any number of callables (function names, closures, array($object,'func_name'), etc.) and calls them consecutively using the return value as a feature's unique string.
FeatureBasedNB — Class in namespace NlpTools\Models: Implement a MultinomialNBModel by training on a TrainingSet with a FeatureFactoryInterface and additive smoothing.
FeatureBasedLinearOptimizerInterface — Class in namespace NlpTools\Optimizers
FromFile — Class in namespace NlpTools\Random\Generators: Return floats from a file.
Normalizer::factory() — Method in class Normalizer: Just instantiate the normalizer using a factory method.
VowelsAbstractFactory::factory() — Method in class VowelsAbstractFactory: Return the correct language vowel checker

G

FreqDist::getTotalTokens() — Method in class FreqDist: Get the total number of tokens in this tokensDocument
FreqDist::getWeightPerToken() — Method in class FreqDist: Return the weight of a single token
FreqDist::getTotalUniqueTokens() — Method in class FreqDist: Return get the total number of unique tokens
FreqDist::getKeys() — Method in class FreqDist: Return the sorted keys by frequency desc
FreqDist::getValues() — Method in class FreqDist: Return the sorted values by frequency desc
FreqDist::getKeyValues() — Method in class FreqDist: Return the full key value store
FreqDist::getHapaxes() — Method in class FreqDist: Returns an array of tokens that occurred once
FeatureBasedLinearClassifier::getVote() — Method in class FeatureBasedLinearClassifier: Compute the features that fire for the Document $d.
MultinomialNBClassifier::getScore() — Method in class MultinomialNBClassifier: Compute the log of the probability of the Document $d belonging to class $class.
CentroidFactoryInterface::getCentroid() — Method in class CentroidFactoryInterface: Parse the provided docs and create a doc that given a metric of distance is the centroid of the provided docs.
Euclidean::getCentroid() — Method in class Euclidean: Parse the provided docs and create a doc that given a metric of distance is the centroid of the provided docs.
Hamming::getCentroid() — Method in class Hamming: Parse the provided docs and create a doc that given a metric of distance is the centroid of the provided docs.
MeanAngle::getCentroid() — Method in class MeanAngle: Parse the provided docs and create a doc that given a metric of distance is the centroid of the provided docs.
GroupAverage — Class in namespace NlpTools\Clustering\MergeStrategies: In single linkage clustering the new distance of the merged cluster with cluster i is the average distance of all points in cluster x to i and y to i.
GroupAverage::getNextMerge() — Method in class GroupAverage: Return the pair of clusters x,y to be merged.
HeapLinkage::getNextMerge() — Method in class HeapLinkage: Return the pair of clusters x,y to be merged.
MergeStrategyInterface::getNextMerge() — Method in class MergeStrategyInterface: Return the next two clusters for merging and assume they are merged (ex.
DocumentInterface::getDocumentData() — Method in class DocumentInterface: Return the data of what is being represented.
RawDocument::getDocumentData() — Method in class RawDocument: Return the data of what is being represented.
TokensDocument::getDocumentData() — Method in class TokensDocument: Return the data of what is being represented.
TrainingDocument::getDocumentData() — Method in class TrainingDocument: Return the data of what is being represented.
TrainingDocument::getClass() — Method in class TrainingDocument
TrainingSet::getClassSet() — Method in class TrainingSet
WordDocument::getDocumentData() — Method in class WordDocument: Return the data of what is being represented.
DataAsFeatures::getFeatureArray() — Method in class DataAsFeatures: For use with TokensDocument mostly.
FeatureFactoryInterface::getFeatureArray() — Method in class FeatureFactoryInterface: Return an array with unique strings that are the features that "fire" for the specified Document $d and class $class
FunctionFeatures::getFeatureArray() — Method in class FunctionFeatures: Compute the features that "fire" for a given class,document pair.
FeatureBasedNB::getPrior() — Method in class FeatureBasedNB: Return the prior probability of class $class P(c) as computed by the training data
FeatureBasedNB::getCondProb() — Method in class FeatureBasedNB: Return the conditional probability of a term for a given class.
Lda::generateDocs() — Method in class Lda: Generate an array suitable for use with Lda::initialize and Lda::gibbsSample from a training set.
Lda::gibbsSample() — Method in class Lda: Generate one gibbs sample.
Lda::getWordsPerTopicsProbabilities() — Method in class Lda: Get the probability of a word given a topic (phi according to Griffiths and Steyvers)
Lda::getPhi() — Method in class Lda: Shortcut to getWordsPerTopicsProbabilities
Lda::getDocumentsPerTopicsProbabilities() — Method in class Lda: Get the probability of a document given a topic (theta according to Griffiths and Steyvers)
Lda::getTheta() — Method in class Lda: Shortcut to getDocumentsPerTopicsProbabilities
Lda::getLogLikelihood() — Method in class Lda: Log likelihood of the model having generated the data as implemented by M.
LinearModel::getWeight() — Method in class LinearModel: Get the weight for a given feature
LinearModel::getWeights() — Method in class LinearModel: Get all the weights as an array.
MultinomialNBModelInterface::getPrior() — Method in class MultinomialNBModelInterface
MultinomialNBModelInterface::getCondProb() — Method in class MultinomialNBModelInterface
GradientDescentOptimizer — Class in namespace NlpTools\Optimizers: Implements gradient descent with fixed step.
Gamma — Class in namespace NlpTools\Random\Distributions: Implement the gamma distribution.
FromFile::generate() — Method in class FromFile: Generates a pseudo-random number with uniform distribution in the interval [0,1)
GeneratorInterface — Class in namespace NlpTools\Random\Generators: An interface for pseudo-random number generators.
GeneratorInterface::generate() — Method in class GeneratorInterface: Generates a pseudo-random number with uniform distribution in the interval [0,1)
MersenneTwister::generate() — Method in class MersenneTwister: Generates a pseudo-random number with uniform distribution in the interval [0,1)
MersenneTwister::get() — Method in class MersenneTwister
GreekStemmer — Class in namespace NlpTools\Stemmers: This stemmer is an implementation of the stemmer described by G.
LancasterStemmer::getDefaultRuleSet() — Method in class LancasterStemmer: Contains an array with the default lancaster rules
Greek — Class in namespace NlpTools\Utils\Normalizers: To normalize greek text we use mb_strtolower to transform to lower case and then replace every accented character with its non-accented counter part and the final ς with σ

H

Hamming — Class in namespace NlpTools\Clustering\CentroidFactories: This class computes the centroid of the hamming distance between two stringsthat are the binary representations of two integers (the strings are supposedto only contain the characters 1 and 0).
Hierarchical — Class in namespace NlpTools\Clustering: This class implements hierarchical agglomerative clustering.
HeapLinkage — Class in namespace NlpTools\Clustering\MergeStrategies: HeapLinkage is an abstract merge strategy.
HammingDistance — Class in namespace NlpTools\Similarity: This class implements the hamming distance of two strings or sets.

I

Idf — Class in namespace NlpTools\Analysis: Idf implements the inverse document frequency measure.
GroupAverage::initializeStrategy() — Method in class GroupAverage: Initialize the distance matrix and any other data structure needed to calculate the merges later.
HeapLinkage::initializeStrategy() — Method in class HeapLinkage: Initialize the distance matrix and any other data structure needed to calculate the merges later.
MergeStrategyInterface::initializeStrategy() — Method in class MergeStrategyInterface: Study the docs and preprocess anything required for computing the merges
InvalidExpression — Class in namespace NlpTools\Exceptions: Used by the tokenization, primarily
InvalidExpression::invalidRegex() — Method in class InvalidExpression
Lda::initialize() — Method in class Lda: Count initially the co-occurences of documents,topics and topics,words and cache them to run Gibbs sampling faster
EnglishVowels::isVowel() — Method in class EnglishVowels: Returns true if the letter at the given index is a vowel, works with y
VowelsAbstractFactory::isVowel() — Method in class VowelsAbstractFactory: Check if the the letter at the given index is a vowel

J

JaccardIndex — Class in namespace NlpTools\Similarity: http://en.wikipedia.org/wiki/Jaccard_index

K

KMeans — Class in namespace NlpTools\Clustering: This clusterer uses the KMeans algorithm for clustering documents.
TrainingSet::key() — Method in class TrainingSet

L

Lda — Class in namespace NlpTools\Models: Topic discovery with latent dirchlet allocation using gibbs sampling.
LinearModel — Class in namespace NlpTools\Models: This class represents a linear model of the following form f(x_vec) = l1*x1 + l2*x2 + l3*x3 ...
LancasterStemmer — Class in namespace NlpTools\Stemmers: A word stemmer based on the Lancaster stemming algorithm.

M

MultinomialNBClassifier — Class in namespace NlpTools\Classifiers: Use a multinomia NB model to classify a document
MeanAngle — Class in namespace NlpTools\Clustering\CentroidFactories: MeanAngle computes the unit vector with angle the average of all the given vectors.
MergeStrategyInterface — Class in namespace NlpTools\Clustering\MergeStrategies: In hierarchical agglomerative clustering each document starts in its own cluster and then it is subsequently merged with the "closest" cluster.
FunctionFeatures::modelFrequency() — Method in class FunctionFeatures: Set the feature factory to model frequency instead of presence
FunctionFeatures::modelPresence() — Method in class FunctionFeatures: Set the feature factory to model presence instead of frequency
Maxent — Class in namespace NlpTools\Models: Maxent is a model that assigns a weight for each feature such that all the weights maximize the Conditional Log Likelihood of the training data.
MultinomialNBModelInterface — Class in namespace NlpTools\Models: Interface that describes a NB model.
MaxentGradientDescent — Class in namespace NlpTools\Optimizers: Implement a gradient descent algorithm that maximizes the conditional log likelihood of the training data.
MaxentOptimizerInterface — Class in namespace NlpTools\Optimizers: Marker interface to use with the Maxent model for type checking
MersenneTwister — Class in namespace NlpTools\Random\Generators: A simple wrapper over the built in mt_rand() method

N

TrainingSet::next() — Method in class TrainingSet
Normal — Class in namespace NlpTools\Random\Distributions
English::normalize() — Method in class English: Transform the word according to the class description
Greek::normalize() — Method in class Greek: Transform the word according to the class description
Normalizer — Class in namespace NlpTools\Utils\Normalizers: The Normalizer's purpose is to transform any word from any one of the possible writings to a single writing consistently.
Normalizer::normalize() — Method in class Normalizer: Transform the word according to the class description
Normalizer::normalizeAll() — Method in class Normalizer: Apply the normalize function to all the items in the array

O

Idf::offsetGet() — Method in class Idf: Implements the array access interface.
Idf::offsetExists() — Method in class Idf: Implements the array access interface.
Idf::offsetSet() — Method in class Idf: Will not be implemented.
Idf::offsetUnset() — Method in class Idf: Will not be implemented.
TrainingSet::offsetSet() — Method in class TrainingSet
TrainingSet::offsetUnset() — Method in class TrainingSet
TrainingSet::offsetGet() — Method in class TrainingSet
TrainingSet::offsetExists() — Method in class TrainingSet
ExternalMaxentOptimizer::optimize() — Method in class ExternalMaxentOptimizer: This function receives an array that contains an array for each document which contains an array of feature identifiers for each class and at the special key '__label__' the actual class of the training document.
FeatureBasedLinearOptimizerInterface::optimize() — Method in class FeatureBasedLinearOptimizerInterface: This function receives an array that contains an array for each document which contains an array of feature identifiers for each class and at the special key '__label__' the actual class of the training document.
GradientDescentOptimizer::optimize() — Method in class GradientDescentOptimizer: This function receives an array that contains an array for each document which contains an array of feature identifiers for each class and at the special key '__label__' the actual class of the training document.

P

Maxent::P() — Method in class Maxent: Calculate the probability that document $d belongs to the class $class given a set of possible classes, a feature factory and the model's weights l[i]
PorterStemmer — Class in namespace NlpTools\Stemmers: Copyright 2013 Katharopoulos Angelos <katharas@gmail.com>
PennTreeBankTokenizer — Class in namespace NlpTools\Tokenizers: PennTreeBank Tokenizer Based on http://www.cis.upenn.edu/~treebank/tokenizer.sed

R

RawDocument — Class in namespace NlpTools\Documents: RawDocument simply encapsulates a php variable
TrainingSet::rewind() — Method in class TrainingSet
GradientDescentOptimizer::reportProgress() — Method in class GradientDescentOptimizer
RegexStemmer — Class in namespace NlpTools\Stemmers: This stemmer removes affixes according to a regular expression.
RegexTokenizer — Class in namespace NlpTools\Tokenizers: Regex tokenizer tokenizes text based on a set of regexes
ClassifierBasedTransformation::register() — Method in class ClassifierBasedTransformation: Register a set of transformations for a given class.

S

SingleLink — Class in namespace NlpTools\Clustering\MergeStrategies: In single linkage clustering the new distance of the merged cluster with cluster i is the smallest distance of either cluster x to i or y to i.
TrainingSet::setAsKey() — Method in class TrainingSet: Decide what should be returned as key when iterated upon
AbstractDistribution::sample() — Method in class AbstractDistribution
Dirichlet::sample() — Method in class Dirichlet
Gamma::sample() — Method in class Gamma
Normal::sample() — Method in class Normal
CosineSimilarity::similarity() — Method in class CosineSimilarity
JaccardIndex::similarity() — Method in class JaccardIndex
Simhash — Class in namespace NlpTools\Similarity: Simhash is an implementation of the locality sensitive hash function families proposed by Moses Charikar using the Earth Mover's Distance http://www.cs.princeton.edu/courses/archive/spring04/cos598B/bib/CharikarEstim.pdf
Simhash::simhash() — Method in class Simhash: Compute the locality sensitive hash for this set.
Simhash::similarity() — Method in class Simhash
SimilarityInterface — Class in namespace NlpTools\Similarity: Similarity should return a number that is proportional to how similar those two instances are (with any metric).
SimilarityInterface::similarity() — Method in class SimilarityInterface
GreekStemmer::stem() — Method in class GreekStemmer: Remove the suffix from $word
LancasterStemmer::stem() — Method in class LancasterStemmer: Performs a Lancaster stem on the giving word
PorterStemmer::stem() — Method in class PorterStemmer: Remove the suffix from $word
RegexStemmer::stem() — Method in class RegexStemmer: Remove the suffix from $word
Stemmer — Class in namespace NlpTools\Stemmers: http://en.wikipedia.org/wiki/Stemming
Stemmer::stem() — Method in class Stemmer: Remove the suffix from $word
Stemmer::stemAll() — Method in class Stemmer: Apply the stemmer to every single token.
StopWords — Class in namespace NlpTools\Utils: Stop Words are words which are filtered out because they carry little to no information.

T

TokensDocument — Class in namespace NlpTools\Documents: Represents a bag of words (tokens) document.
TrainingDocument — Class in namespace NlpTools\Documents: A TrainingDocument is a document that "decorates" any other document to add the real class of the document.
TrainingSet — Class in namespace NlpTools\Documents: A collection of TrainingDocument objects.
FeatureBasedNB::train_with_context() — Method in class FeatureBasedNB: Train on the given set and fill the model's variables.
FeatureBasedNB::train() — Method in class FeatureBasedNB: Train on the given set and fill the models variables
Lda::train() — Method in class Lda: Run the gibbs sampler $it times.
Maxent::train() — Method in class Maxent: Calculate all the features for every possible class.
Stemmer::transform() — Method in class Stemmer: Return the value transformed.
ClassifierBasedTokenizer::tokenize() — Method in class ClassifierBasedTokenizer: Break a character sequence to a token sequence
PennTreeBankTokenizer::tokenize() — Method in class PennTreeBankTokenizer: Calls internal functions to handle data processing
RegexTokenizer::tokenize() — Method in class RegexTokenizer: Break a character sequence to a token sequence
TokenizerInterface — Class in namespace NlpTools\Tokenizers
TokenizerInterface::tokenize() — Method in class TokenizerInterface: Break a character sequence to a token sequence
WhitespaceAndPunctuationTokenizer::tokenize() — Method in class WhitespaceAndPunctuationTokenizer: Break a character sequence to a token sequence
WhitespaceTokenizer::tokenize() — Method in class WhitespaceTokenizer: Break a character sequence to a token sequence
ClassifierBasedTransformation::transform() — Method in class ClassifierBasedTransformation: Classify the passed in variable w and then apply each transformation to the output of the previous one.
Normalizer::transform() — Method in class Normalizer: Return the value transformed.
StopWords::transform() — Method in class StopWords: Return the value transformed.
TransformationInterface — Class in namespace NlpTools\Utils: TransformationInterface represents any type of transformation to be applied upon documents.
TransformationInterface::transform() — Method in class TransformationInterface: Return the value transformed.

V

TrainingSet::valid() — Method in class TrainingSet
VowelsAbstractFactory — Class in namespace NlpTools\Utils: Factory wrapper for Vowels

W

WordDocument — Class in namespace NlpTools\Documents: A Document that represents a single word but with a context of a larger document.
WhitespaceAndPunctuationTokenizer — Class in namespace NlpTools\Tokenizers: Simple white space tokenizer.
WhitespaceTokenizer — Class in namespace NlpTools\Tokenizers: Simple white space tokenizer.

_

FreqDist::__construct() — Method in class FreqDist: This sorts the token meta data collection right away so use frequency distribution data can be extracted.
Idf::__construct() — Method in class Idf
FeatureBasedLinearClassifier::__construct() — Method in class FeatureBasedLinearClassifier
MultinomialNBClassifier::__construct() — Method in class MultinomialNBClassifier
Hierarchical::__construct() — Method in class Hierarchical
KMeans::__construct() — Method in class KMeans: Initialize the K Means clusterer
RawDocument::__construct() — Method in class RawDocument
TokensDocument::__construct() — Method in class TokensDocument
TrainingDocument::__construct() — Method in class TrainingDocument
TrainingSet::__construct() — Method in class TrainingSet
WordDocument::__construct() — Method in class WordDocument
FunctionFeatures::__construct() — Method in class FunctionFeatures
FeatureBasedNB::__construct() — Method in class FeatureBasedNB
FeatureBasedNB::__sleep() — Method in class FeatureBasedNB: Just save the probabilities for reuse
Lda::__construct() — Method in class Lda
LinearModel::__construct() — Method in class LinearModel
ExternalMaxentOptimizer::__construct() — Method in class ExternalMaxentOptimizer
GradientDescentOptimizer::__construct() — Method in class GradientDescentOptimizer
AbstractDistribution::__construct() — Method in class AbstractDistribution
Dirichlet::__construct() — Method in class Dirichlet
Gamma::__construct() — Method in class Gamma
Normal::__construct() — Method in class Normal
FromFile::__construct() — Method in class FromFile: Construct a FromFile generator
Simhash::__construct() — Method in class Simhash
LancasterStemmer::__construct() — Method in class LancasterStemmer: Constructor loads the ruleset into memory
RegexStemmer::__construct() — Method in class RegexStemmer
ClassifierBasedTokenizer::__construct() — Method in class ClassifierBasedTokenizer
PennTreeBankTokenizer::__construct() — Method in class PennTreeBankTokenizer
RegexTokenizer::__construct() — Method in class RegexTokenizer: Initialize the Tokenizer
ClassifierBasedTransformation::__construct() — Method in class ClassifierBasedTransformation: In order to classify anything with NlpTools we need something that implements the ClassifierInterface.
StopWords::__construct() — Method in class StopWords