Skip to content

R impementation of collapsed Gibbs sampler for LDA

Notifications You must be signed in to change notification settings

niutyut/LdaGibbs

Repository files navigation

LdaGibbs

R Implementation of a collapsed Gibbs sampler for approximate inference in the Latent Dirichlet Allocation (LDA) model [1]. Parameter estimation is done in C++ for faster inference, interfacing with R via Rcpp [2]. LDA is a topic modeling algorithm for a corpus of text datasets. Includes interface for preprocessing data and 'post-processing' the output of the algorithm (visualization, etc.)

TODO

  • Clean up the pre-processing and output code for streamlined use.
  • Think up new ways to present the results.
  • Implement Online Variational Learning [3].
  • Develop a heuristic technique for choosing the correct number of topics.

Depends

  • Rcpp - for interface to C++ code
  • tm - for preprocessing
  • SnowballC - for stemming (in pre-processing).
  • reshape and ggplot2 - for visualizing output.

Use

You can see full demos in demoReuters.R, demoCora.R, and demoPatents.R.

References

[1] http://machinelearning.wustl.edu/mlpapers/paper_files/BleiNJ03.pdf

[2] http://cran.r-project.org/web/packages/Rcpp/index.html

[3] https://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf

About

R impementation of collapsed Gibbs sampler for LDA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published