Natural language processing in php


In machine learning classification is the problem of identifying in which of a set of categories (classes) a new observation belongs. Wikipedia


The Classifier interface contains only the function classify that receives the set of categories (classes) and a Document which contains all the data of a new observation and returns a predicted class for the Document.

  1. interface ClassifierInterface
  2. {
  3. /**
  4. * Decide in which class C member of $classes would $d fit best.
  5. *
  6. * @param array $classes A set of classes
  7. * @param Document $d A Document
  8. * @return string A class
  9. */
  10. public function classify(array $classes, Document $d);
  11. }

Feature based linear classifier

This classifier needs a Feature Factory and a Linear Model.

For a given document and a class the feature vector is computed. Through a linear combination with the weights of the Linear Model a vote for the given class Ci is computed. The class C that maximizes the vote is the predicted class.

Multinomial Naive Bayes Classifier

This classifier also needs a Feature Factory. In addition it needs a Multinomial Naive Bayes Model.

You can find a thorough explanation of this method by the Stanford NLP department.

In short, the probability of a document d belonging in the class c is computed using the prior probablities of the classes and the assumption that each indicator random variable (the features in our case) is independent with any other.

« Feature Factories / Bayesian Model »