class Lda
Topic discovery with latent dirchlet allocation using gibbs sampling.
The implementation is based on the paper by Griffiths and Steyvers
that can be found http://www.ncbi.nlm.nih.gov/pmc/articles/PMC387300/.
It is also heavily influenced (especially on the implementation and
debugging of the online gibbs sampler) by the python implementation
by Mathieu Blondel at https://gist.github.com/mblondel/542786
Methods
__construct(FeatureFactoryInterface $ff, integer $ntopics, float $a = 1, float $b = 1) | ||
generateDocs(TrainingSet $tset)
Generate an array suitable for use with Lda::initialize and Lda::gibbsSample from a training set. |
||
initialize(array $docs)
Count initially the co-occurences of documents,topics and topics,words and cache them to run Gibbs sampling faster |
||
train(TrainingSet $tset, $it $it)
Run the gibbs sampler $it times. |
||
gibbsSample(array $docs)
Generate one gibbs sample. |
||
array |
getWordsPerTopicsProbabilities($limit_words $limit_words = -1)
Get the probability of a word given a topic (phi according to Griffiths and Steyvers) |
|
getPhi($limit_words = -1)
Shortcut to getWordsPerTopicsProbabilities |
||
array |
getDocumentsPerTopicsProbabilities($limit_docs $limit_docs = -1)
Get the probability of a document given a topic (theta according to Griffiths and Steyvers) |
|
getTheta($limit_docs = -1)
Shortcut to getDocumentsPerTopicsProbabilities |
||
getLogLikelihood()
Log likelihood of the model having generated the data as implemented by M. |
Details
at line 45
public
__construct(FeatureFactoryInterface $ff, integer $ntopics, float $a = 1, float $b = 1)
at line 60
public
generateDocs(TrainingSet $tset)
Generate an array suitable for use with Lda::initialize and Lda::gibbsSample from a training set.
at line 75
public
initialize(array $docs)
Count initially the co-occurences of documents,topics and topics,words and cache them to run Gibbs sampling faster
at line 135
public
train(TrainingSet $tset, $it $it)
Run the gibbs sampler $it times.
at line 153
public
gibbsSample(array $docs)
Generate one gibbs sample.
The docs must have been passed to initialize previous to calling
this function.
at line 192
public array
getWordsPerTopicsProbabilities($limit_words $limit_words = -1)
Get the probability of a word given a topic (phi according to Griffiths and Steyvers)
at line 219
public
getPhi($limit_words = -1)
Shortcut to getWordsPerTopicsProbabilities
at line 231
public array
getDocumentsPerTopicsProbabilities($limit_docs $limit_docs = -1)
Get the probability of a document given a topic (theta according to Griffiths and Steyvers)
at line 262
public
getTheta($limit_docs = -1)
Shortcut to getDocumentsPerTopicsProbabilities
at line 271
public
getLogLikelihood()
Log likelihood of the model having generated the data as implemented by M.
Blondel