NlpTools API
Class

NlpTools\Tokenizers\WhitespaceAndPunctuationTokenizer

class WhitespaceAndPunctuationTokenizer implements TokenizerInterface

Simple white space tokenizer.

Breaks either on whitespace or on word
boundaries (ex.: dots, commas, etc)
Does not include white space in tokens.
Every punctuation character is a signle token

Methods

array tokenize(string $str)

Break a character sequence to a token sequence

Details

at line 13
public array tokenize(string $str)

Break a character sequence to a token sequence

Parameters

string $str The text for tokenization

Return Value

array The list of tokens from the string