Maximum Entropy
The principle of maximum entropy states that, subject to precisely stated prior data (such as a proposition that expresses testable information), the probability distribution which best represents the current state of knowledge is the one with largest entropy. Wikipedia
So in this maximum entropy model we try to find a set of weights that correctly predict the training data without making any assumptions. For more information regarding the underlying math of the model I would suggest reading this very informative pdf by Stanford.
Optimizers
Because training a maximum entropy model is not as trivial as training a Naive Bayes model, Maxent also takes a MaxentOptimizer as a third parameter.
Example use
In the following example we train a classifier to recognise the start of a sentence. I, purposefully, use very little features (not the data for example) in order to be easy to see what happens to the weights assigned by the optimizer.
Specifically we have two classes and three features which results in 6 different class,feature pairs. The dump of the weights is the following.
Array
(
[sentence ^ startsUpper] => 0.055798773871977
[other ^ startsUpper] => -0.055798773871977
[sentence ^ isLower] => -36.5
[other ^ isLower] => 36.5
[sentence ^ prev_ends_with(.)] => 1.7698599992222
[other ^ prev_ends_with(.)] => -1.7698599992222
)
We see that if the word is all lowercase it is really unlikely to start a sentence and it is given a weight of -36.5.
include('vendor/autoload.php'); use NlpTools\Tokenizers\WhitespaceTokenizer; use NlpTools\Documents\WordDocument; use NlpTools\Documents\Document; use NlpTools\Documents\TrainingSet; use NlpTools\Models\Maxent; use NlpTools\Optimizers\MaxentGradientDescent; use NlpTools\FeatureFactories\FunctionFeatures; $s = "When the objects of an inquiry, in any department, have principles, conditions, or elements, it is through acquaintance with these that knowledge, that is to say scientific knowledge, is attained. For we do not think that we know a thing until we are acquainted with its primary conditions or first principles, and have carried our analysis as far as its simplest elements. Plainly therefore in the science of Nature, as in other branches of study, our first task will be to try to determine what relates to its principles."; // tokens 0,30,62 are the start of a new sentence // the rest will be said to have the class other $tok = new WhitespaceTokenizer(); $tokens = $tok->tokenize($s); $tset = new TrainingSet(); $tokens, function ($t,$i) use($tset,$tokens) { $tset->addDocument( 'other', new WordDocument($tokens,$i,1) // get word and the previous/next ); } else { $tset->addDocument( 'sentence', new WordDocument($tokens,$i,1) // get word and the previous/next ); } } ); // Remember that in maxent a feature should also target the class // thus we prepend each feature name with the class name $ff = new FunctionFeatures( function ($class, Document $d) { // $data[0] is the current word // $data[1] is an array of previous words // $data[2] is an array of following words $data = $d->getDocumentData(); // check if the previous word ends with '.' } }, function ($class, Document $d) { $data = $d->getDocumentData(); // check if this word starts with a capital }, function ($class, Document $d) { $data = $d->getDocumentData(); // check if this word is all lowercase } ) ); // instanciate a gradient descent optimizer for maximum entropy $optimizer = new MaxentGradientDescent( 0.001, // Stop if each weight changes less than 0.001 0.1, // learning rate 10 // maximum iterations ); // an empty maxent model // train $maxent->train($ff,$tset,$optimizer); // show the weights $maxent->dumpWeights();