NlpTools API
Class

NlpTools\Models\FeatureBasedNB

class FeatureBasedNB implements MultinomialNBModelInterface

Implement a MultinomialNBModel by training on a TrainingSet with a FeatureFactoryInterface and additive smoothing.

Methods

__construct()

float getPrior(string $class)

Return the prior probability of class $class P(c) as computed by the training data

float getCondProb(string $term, string $class)

Return the conditional probability of a term for a given class.

array train_with_context(array $train_ctx, FeatureFactoryInterface $ff, TrainingSet $tset, integer $a_smoothing = 1)

Train on the given set and fill the model's variables.

array train(FeatureFactoryInterface $ff, TrainingSet $tset, integer $a_smoothing = 1)

Train on the given set and fill the models variables

__sleep()

Just save the probabilities for reuse

Details

at line 21
public __construct()

at line 35
public float getPrior(string $class)

Return the prior probability of class $class P(c) as computed by the training data

Parameters

string $class

Return Value

float prior probability

at line 47
public float getCondProb(string $term, string $class)

Return the conditional probability of a term for a given class.

Parameters

string $term The term (word, feature id, ...)
string $class The class

Return Value

float

at line 70
public array train_with_context(array $train_ctx, FeatureFactoryInterface $ff, TrainingSet $tset, integer $a_smoothing = 1)

Train on the given set and fill the model's variables.

Use the
training context provided to update the counts as if the training
set was appended to the previous one that provided the context.

It can be used for incremental training. It is not meant to be used
with the same training set twice.

Parameters

array $train_ctx The previous training context
FeatureFactoryInterface $ff A feature factory to compute features from a training document
TrainingSet $tset The training set
integer $a_smoothing The parameter for additive smoothing. Defaults to add-one smoothing.

Return Value

array Return a training context to be used for further incremental training, although this is not necessary since the changes also happen in place

at line 112
public array train(FeatureFactoryInterface $ff, TrainingSet $tset, integer $a_smoothing = 1)

Train on the given set and fill the models variables

priors[c] = NDocs[c]/NDocs
condprob[t][c] = count( t in c) + 1 / sum( count( t' in c ) + 1 , for every t' )
unknown[c] = condbrob['word that doesnt exist in c'][c] ( so that count(t in c)==0 )

More information on the algorithm can be found at
http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html

Parameters

FeatureFactoryInterface $ff A feature factory to compute features from a training document
TrainingSet $tset The training set
integer $a_smoothing The parameter for additive smoothing. Defaults to add-one smoothing.

Return Value

array Return a training context to be used for incremental training

at line 192
public __sleep()

Just save the probabilities for reuse