Bayesian model
NlpTools, for now at least, only implements the Naive Bayes model. The interface of the model is the following.
interface MultinomialNBModelInterface { public function getPrior($class); public function getCondProb($term,$class); }
There is one class that currently implements the above interface FeatureBasedNB.
Feature based NB
FeatureBasedNB has a train method that implements the Naive Bayes supervised learning. FeatureBasedNB's train method takes a feature factory and a training set as parameters. Simply by counting the occurences of each feature in each document it computes the necessary probabilites.
It also uses additive smoothing to account for features never seen in the training set.
The train method returns a training context that can be used with the function train_with_context for incremental training.
A good example of the use of the FeatureBasedNB class is shown on the documentation index (it is repeated here).
include('vendor/autoload.php'); // won't include it again in the following examples use NlpTools\Tokenizers\WhitespaceTokenizer; use NlpTools\Models\FeatureBasedNB; use NlpTools\Documents\TrainingSet; use NlpTools\Documents\TokensDocument; use NlpTools\FeatureFactories\DataAsFeatures; use NlpTools\Classifiers\MultinomialNBClassifier; // ---------- Data ---------------- // data is taken from http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection // we use a part for training array('ham','Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...'), ... array('spam','England v Macedonia - dont miss the goals/team news. Txt ur national team to 87077 eg ENGLAND to 87077 Try:WALES, SCOTLAND 4txt/ú1.20 POBOXox36504W45WQ 16+') ); // and another for evaluating array('ham','I\'ve been searching for the right words to thank you for this breather. I promise i wont take your help for granted and will fulfil my promise. You have been wonderful and a blessing at all times.'), ... array('spam','XXXMobileMovieClub: To use your credit, click the WAP link in the next txt message or click here>> http://wap. xxxmobilemovieclub.com?n=QJKGIGHJJGCBL') ); $tset = new TrainingSet(); // will hold the training documents $tok = new WhitespaceTokenizer(); // will split into tokens $ff = new DataAsFeatures(); // see features in documentation // ---------- Training ---------------- foreach ($training as $d) { $tset->addDocument( $d[0], // class new TokensDocument( $tok->tokenize($d[1]) // The actual document ) ); } $model = new FeatureBasedNB(); // train a Naive Bayes model $model->train($ff,$tset); // ---------- Classification ---------------- $cls = new MultinomialNBClassifier($ff,$model); $correct = 0; foreach ($testing as $d) { // predict if it is spam or ham $prediction = $cls->classify( new TokensDocument( $tok->tokenize($d[1]) // The document ) ); if ($prediction==$d[0]) $correct ++; }