In machine learning classification is the problem of identifying in which of a set of categories (classes) a new observation belongs. Wikipedia
The Classifier interface contains only the function classify that receives the set of categories (classes) and a Document which contains all the data of a new observation and returns a predicted class for the Document.
- interface ClassifierInterface
- * Decide in which class C member of $classes would $d fit best.
- * @param array $classes A set of classes
- * @param Document $d A Document
- * @return string A class
- public function classify(array $classes, Document $d);
Feature based linear classifier
For a given document and a class the feature vector is computed. Through a linear combination with the weights of the Linear Model a vote for the given class Ci is computed. The class C that maximizes the vote is the predicted class.
Multinomial Naive Bayes Classifier
You can find a thorough explanation of this method by the Stanford NLP department.
In short, the probability of a document d belonging in the class c is computed using the prior probablities of the classes and the assumption that each indicator random variable (the features in our case) is independent with any other.