Getting started with NlpTools
NlpTools provides the building blocks for training classification models for natural language processing. NlpTools requires php 5.3 or greater. It is composer-able so all you have to do to use it is write a composer.json like this
{
"require": {
"nlp-tools/nlp-tools": "1.0.*@dev"
}
}
and then an example
include('vendor/autoload.php'); use NlpTools\Tokenizers\WhitespaceAndPunctuationTokenizer; $text = "Please allow me to introduce myself I'm a man of wealth and taste"; $tok = new WhitespaceAndPunctuationTokenizer(); // Array // ( // [0] => Please // [1] => allow // [2] => me // [3] => to // [4] => introduce // [5] => myself // [6] => I // [7] => ' // [8] => m // [9] => a // [10] => man // [11] => of // [12] => wealth // [13] => and // [14] => taste // )
Classification
NlpTools does not contain (yet) prebuilt models. The examples that will be given here will also contain the training of the model thus they may seem a bit lengthy. The above example doesn't classify anything and doesn't need any training.
Next we will add a classification example.
include('vendor/autoload.php'); // won't include it again in the following examples use NlpTools\Tokenizers\WhitespaceTokenizer; use NlpTools\Models\FeatureBasedNB; use NlpTools\Documents\TrainingSet; use NlpTools\Documents\TokensDocument; use NlpTools\FeatureFactories\DataAsFeatures; use NlpTools\Classifiers\MultinomialNBClassifier; // ---------- Data ---------------- // data is taken from http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection // we use a part for training array('ham','Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...'), ... array('spam','England v Macedonia - dont miss the goals/team news. Txt ur national team to 87077 eg ENGLAND to 87077 Try:WALES, SCOTLAND 4txt/ú1.20 POBOXox36504W45WQ 16+') ); // and another for evaluating array('ham','I\'ve been searching for the right words to thank you for this breather. I promise i wont take your help for granted and will fulfil my promise. You have been wonderful and a blessing at all times.'), ... array('spam','XXXMobileMovieClub: To use your credit, click the WAP link in the next txt message or click here>> http://wap. xxxmobilemovieclub.com?n=QJKGIGHJJGCBL') ); $tset = new TrainingSet(); // will hold the training documents $tok = new WhitespaceTokenizer(); // will split into tokens $ff = new DataAsFeatures(); // see features in documentation // ---------- Training ---------------- foreach ($training as $d) { $tset->addDocument( $d[0], // class new TokensDocument( $tok->tokenize($d[1]) // The actual document ) ); } $model = new FeatureBasedNB(); // train a Naive Bayes model $model->train($ff,$tset); // ---------- Classification ---------------- $cls = new MultinomialNBClassifier($ff,$model); $correct = 0; foreach ($testing as $d) { // predict if it is spam or ham $prediction = $cls->classify( new TokensDocument( $tok->tokenize($d[1]) // The document ) ); if ($prediction==$d[0]) $correct ++; }