latent_dirichlet_allocation

Project work in solidifying learnings in Latent Dirichlet Allocation for predicting words in documents.

Project Organisation

Data is in the data folder, and each task has its own python file for that investigation.

Task 1: Finding Maximum Likelihood Multinomial estimates over the words.

This is a simple task of finding the maximum likelihood estimates of the multinomial distribution over the words in the document by counting the number of times each word appears in the document and dividing by the total number of words.

Task 2: Bayesian Inference using symmetric Dirichlet prior

This task involves using a symmetric Dirichlet prior to estimate the posterior distribution of the multinomial distribution over the words in the document. The posterior distribution is given by the Dirichlet distribution with parameters alpha + n_i, where n_i is the number of times word i appears in the document and alpha is the parameter of the Dirichlet prior (constant over all words).

Task 3: Perplexity

Perplexity is a measure of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models. A low perplexity indicates the probability distribution is good at predicting the sample. This task involves calculating the perplexity of the model on a test set of documents and comparing using the multinomial vs categorical distribution.

Task 4: Bernoulli Mixture Model (BMM) a.k.a. Mixture of Multinomials Model

This task involves attemping to sample from the posterior distribution of latent topic assignments to documents using a collapsed Gibbs Sampler.

Task 5: Latent Dirichlet Allocation (LDA)

This task involves implementing the Latent Dirichlet Allocation algorithm to predict words in documents. The algorithm is a generative probabilistic model for collections of discrete data such as text corpora. It is also a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Data		Data
README.md		README.md
Report.pdf		Report.pdf
bmm.py		bmm.py
lda.py		lda.py
sampleDiscrete.py		sampleDiscrete.py
task1.py		task1.py
task2.py		task2.py
task3.py		task3.py
task4.py		task4.py
task4_plotting.py		task4_plotting.py
task5.py		task5.py
task5_plotting.py		task5_plotting.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

latent_dirichlet_allocation

Project Organisation

Task 1: Finding Maximum Likelihood Multinomial estimates over the words.

Task 2: Bayesian Inference using symmetric Dirichlet prior

Task 3: Perplexity

Task 4: Bernoulli Mixture Model (BMM) a.k.a. Mixture of Multinomials Model

Task 5: Latent Dirichlet Allocation (LDA)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

latent_dirichlet_allocation

Project Organisation

Task 1: Finding Maximum Likelihood Multinomial estimates over the words.

Task 2: Bayesian Inference using symmetric Dirichlet prior

Task 3: Perplexity

Task 4: Bernoulli Mixture Model (BMM) a.k.a. Mixture of Multinomials Model

Task 5: Latent Dirichlet Allocation (LDA)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages