NlpTools

Natural language processing in php

Natural language processing tools

NlpTools is a library for natural language processing written in php. Its development is driven by my own needs for text classification, clustering, tokenizing, stemming etc.

A tiny working example

  1. include('vendor/autoload.php');
  2. function getWikipediaPage($page) {
  3. ini_set('user_agent', 'NlpToolsTest/1.0 (tests@php-nlp-tools.com)');
  4. $page = json_decode(file_get_contents("http://en.wikipedia.org/w/api.php?format=json&action=parse&page=".urlencode($page)),true);
  5. return preg_replace('/\s+/',' ',strip_tags($page['parse']['text']['*']));
  6. }
  7. use NlpTools\Tokenizers\WhitespaceTokenizer;
  8. use NlpTools\Similarity\CosineSimilarity;
  9. $tokenizer = new WhitespaceTokenizer();
  10. $sim = new CosineSimilarity();
  11. $aris = $tokenizer->tokenize(getWikipediaPage('Aristotle'));
  12. $archi = $tokenizer->tokenize(getWikipediaPage('Archimedes'));
  13. $einstein = $tokenizer->tokenize(getWikipediaPage('Albert Einstein'));
  14. $aris_to_archi = $sim->similarity(
  15. $aris,
  16. $archi
  17. );
  18. $aris_to_albert = $sim->similarity(
  19. $aris,
  20. $einstein
  21. );
  22. var_dump($aris_to_archi,$aris_to_albert);

Get it

You can get it at github or through packagist (which I recommend).

{
    "require": {
        "nlp-tools/nlp-tools": "1.0.*@dev"
    }
}

Purpose of this site

Besides being a space on the internet for a library I believe it is useful enough to be shared, I will be hosting here documentation and a blog that will describe step by step small nlp projects using this library.

License

NlpTools is released under the most permissive license in the world, the wtfpl.

Sum up

NlpTools is still under development and there could be plenty of features missing, if you find yourself needing something that NlpTools doesn't currently provide you'd have to code it yourself and I would be glad if you offered it back to the project.