ClassifierBasedTokenizer | A tokenizer that uses a classifier (of any type) to determine if there is an "end of word" (EOW). |
PennTreeBankTokenizer | PennTreeBank Tokenizer Based on http://www.cis.upenn.edu/~treebank/tokenizer.sed |
RegexTokenizer | Regex tokenizer tokenizes text based on a set of regexes |
WhitespaceAndPunctuationTokenizer | Simple white space tokenizer. |
WhitespaceTokenizer | Simple white space tokenizer. |
Interfaces
TokenizerInterface |