Topics Modeling

Misc

Word cloud for topic modellng
Topic modeling with sentiment per topic according to the data in the topic
(TopSBM) topic block modeling, Topsbm

NMF (Non Negative Matrix Factorization )

Non-negative Matrix factorization (NMF)
Medium Article about LDA and NMF (Non-negative Matrix factorization)+ code
Sklearn LDA and NMF for topic modelling

LSA (TFIDF + SVD)

A very good article about LSA (TFIDV X SVD), pLSA, LDA, and LDA2VEC. Including code and explanation about Dirichlet probability. Lda2vec code
A descriptive comparison for LSA pLSA and LDA

LDA (Latent Dirichlet Allocation)

A great summation about topic modeling, Pros and Cons! (LSA, pLSA, LDA)

(LDA) Latent Dirichlet Allocation
LDA is already taken by the above algorithm!
Latent Dirichlet allocation (LDA) - This algorithm takes a group of documents (anything that is made of up text), and returns a number of topics (which are made up of a number of words) most relevant to these documents.
Medium Article about LDA and NMF (Non-negative Matrix factorization)+ code
Medium article on LDA - a good one with pseudo algorithm and proof
In case LDA groups together two topics, we can influence the algorithm in a way that makes those two topics separable - this is called Semi Supervised Guided LDA
LDA tutorials plus code, used this to build my own classes - using gensim mallet wrapper, doesn't work on pyLDAviz, so use this to fix it
Introduction to LDA topic modelling, really good, plus git code
Sklearn examples using LDA and NMF
Tutorial on lda/nmf on medium - using tfidf matrix as input!
Gensim and sklearn LDA variants, comparison, python 3
Medium article on lda/nmf with code
One of the best explanation about Tf-idf vs bow for LDA/NMF - tf for lda, tfidf for nmf, but tfidf can be used for top k selection in lda + visualization, important paper
LDA is a probabilistic generative model that generates documents by sampling a topic for each word and then a word from the sampled topic. The generated document is represented as a bag of words.
NMF is in its general definition the search for 2 matrices W and H such that W*H=V where V is an observed matrix. The only requirement for those matrices is that all their elements must be non negative.
From the above definitions it is clear that in LDA only bag of words frequency counts can be used since a vector of reals makes no sense. Did we create a word 1.2 times? On the other hand we can use any non negative representation for NMF and in the example tf-idf is used.
As far as choosing the number of iterations, for the NMF in scikit learn I don't know the stopping criterion although I believe it is the relative improvement of the loss function being smaller than a threshold so you 'll have to experiment. For LDA I suggest checking manually the improvement of the log likelihood in a held out validation set and stopping when it falls under a threshold. The rest of the parameters depend heavily on the data so I suggest, as suggested by @rpd, that you do a parameter search. So to sum up, LDA can only generate frequencies and NMF can generate any non negative matrix.
How to measure the variance for LDA and NMF, against PCA. 1. Variance score the transformation and inverse transformation of data, test for 1,2,3,4 PCs/LDs/NMs.
Matching lda mallet performance with gensim and sklearn lda via hyper parameters
What is LDA?
1. It is unsupervised natively; it uses joint probability method to find topics(user has to pass # of topics to LDA api). If “Doc X word” is size of input data to LDA, it transforms it to 2 matrices:
2. Doc X topic
3. Word X topic
4. further if you want, you can feed “Doc X topic” matrix to supervised algorithm if labels were given.
Medium on LDA, explains the random probabilistic nature of LDA
Machinelearningplus on LDA in sklearn - a great read, dont forget to read the mallet article.
Medium on LSA pLSA, LDA LDA2vec, high level theoretical - not clear
Medium on LSI vs LDA vs HDP, HDP wins..
Medium on LDA, some historical reference and general high level how to use exapmles.
Incredibly useful response on LDA grid search params and about LDA expectations. Must read.
Lda vs pLSA, talks about the sampling from a distribution of distributions in LDA
BLog post on topic modelling - has some text about overfitting - undiscussed in many places.
Perplexity vs coherence on held out unseen data, not okay and okay, respectively. Due to how we measure the metrics, ie., read the formulas. Also this and this
LDA as dimentionality reduction
LDA on alpha and beta to control density of topics
Jupyter notebook on hacknews LDA topic modelling - missing code?
Jupyter notebook for kmeans, lda, svd,nmf comparison - advice is to keep nmf or other as a baseline to measure against LDA.
Gensim on LDA with code
Medium on lda with sklearn
Selecting the number of topics in LDA, blog 1, blog2, using preplexity, prep and aic bic, coherence, coherence2, coherence 3 with tutorial, unclear, unclear with analysis of stopword % inclusion, unread, paper: heuristic approach, elbow method, using cv, Paper: new stability metric + gh code,
Selecting the top K words in LDA
Presentation: best practices for LDA
Medium on guidedLDA - switching from LDA to a variation of it that is guided by the researcher / data
Medium on lda - another introductory, la times
Topic modelling through time
Mallet vs nltk, params, params
Paper: improving feature models
Lda vs w2v (doesn't make sense to compare, again here
Adding lda features to w2v for classification
Spacy and gensim on 20 news groups
The best topic modelling explanation including Usages, insights, a great read, with code - shows how to find similar docs by topic in gensim, and shows how to transform unseen documents and do similarity using sklearn:
1. Text classification – Topic modeling can improve classification by grouping similar words together in topics rather than using each word as a feature
2. Recommender Systems – Using a similarity measure we can build recommender systems. If our system would recommend articles for readers, it will recommend articles with a topic structure similar to the articles the user has already read.
3. Uncovering Themes in Texts – Useful for detecting trends in online publications for example
4. A Form of Tagging - If document classification is assigning a single category to a text, topic modeling is assigning multiple tags to a text. A human expert can label the resulting topics with human-readable labels and use different heuristics to convert the weighted topics to a set of tags.
5. Topic Modelling for Feature Selection - Sometimes LDA can also be used as feature selection technique. Take an example of text classification problem where the training data contain category wise documents. If LDA is running on sets of category wise documents. Followed by removing common topic terms across the results of different categories will give the best features for a category.
Another great article about LDA, including algorithm, parameters!! And Parameters of LDA
1. Alpha and Beta Hyperparameters – alpha represents document-topic density and Beta represents topic-word density. Higher the value of alpha, documents are composed of more topics and lower the value of alpha, documents contain fewer topics. On the other hand, higher the beta, topics are composed of a large number of words in the corpus, and with the lower value of beta, they are composed of few words.
2. Number of Topics – Number of topics to be extracted from the corpus. Researchers have developed approaches to obtain an optimal number of topics by using Kullback Leibler Divergence Score. I will not discuss this in detail, as it is too mathematical. For understanding, one can refer to this[1] original paper on the use of KL divergence.
3. Number of Topic Terms – Number of terms composed in a single topic. It is generally decided according to the requirement. If the problem statement talks about extracting themes or concepts, it is recommended to choose a higher number, if problem statement talks about extracting features or terms, a low number is recommended.
4. Number of Iterations / passes – Maximum number of iterations allowed to LDA algorithm for convergence.
Ways to improve LDA:
1. Reduce dimentionality of document-term matrix
2. Frequency filter
3. POS filter
4. Batch wise LDA
History of LDA - by the frech guy
Multilingual - alpha is divided by topic count, reaffirms 7
Topic modelling with lda and nmf on medium - has a very good simple example with probabilities
Code: great for top docs, terms, topics etc.
Great article: Many ways of evaluating topics by running LDA
Difference between lda in gensim and sklearn a post on rare
The best code article on LDA/MALLET, and using sklearn (using clustering for getting group of sentences in each topic)
LDA in gensim, a tutorial by gensim
Lda on medium
What are the pros and cons of LDA and NMF in topic modeling? Under what situations should we choose LDA or NMF? Is there comparison of two techniques in topic modeling?
What is the difference between NMF and LDA? Why are the priors of LDA sparse-induced?
Exploring Topic Coherence over many models and many topics lda nmf svd, using umass and uci coherence measures
*** Practical topic findings for short sentence text code
What's the difference between SVD/NMF and LDA as topic model algorithms essentially? Deterministic vs prob based
What is the difference between NMF and LDA? Why are the priors of LDA sparse-induced?
What are the relationships among NMF, tensor factorization, deep learning, topic modeling, etc.?
Code: lda nmf
Unread a comparison of lda and nmf
Presentation: lda sparse coding matrix factorization
An experimental comparison between NMF and LDA for active cross-situational object-word learning
Topic coherence in gensom with jupyter code
Topic modelling dynamic presentation
Paper: Topic modelling and event identification from twitter data, says LDA vs NMI (NMF?) and using coherence to analyze
Just another medium article about ™
What is Wrong with Topic Modeling? (and How to Fix it Using Search-based SE) LDADE's tunings dramatically reduces topic instability.
Talk about topic modelling
Intro to topic modelling
Detecting topics in twitter github code
Another topic model tutorial
(didnt read) NTM - neural topic modeling using embedded spaces with github code
Another lda tutorial
Comparing tweets using lda
Lda and w2v as features for some classification task
Improving ™ with embeddings
w2v/doc2v for topic clustering - need to see the code to understand how they got clean topics, i assume a human rewrote it

Mallet LDA

Diff between lda and mallet - The inference algorithms in Mallet and Gensim are indeed different. Mallet uses Gibbs Sampling which is more precise than Gensim's faster and online Variational Bayes. There is a way to get relatively performance by increasing number of passes.
Mallet in gensim blog post
Alpha beta in mallet: contribution
1. The default for alpha is 5.0 divided by the number of topics. You can think of this as five "pseudo-words" of weight on the uniform distribution over topics. If the document is short, we expect to stay closer to the uniform prior. If the document is long, we would feel more confident moving away from the prior.
2. With hyperparameter optimization, the alpha value for each topic can be different. They usually become smaller than the default setting.
3. The default value for beta is 0.01. This means that each topic has a weight on the uniform prior equal to the size of the vocabulary divided by 100. This seems to be a good value. With optimization turned on, the value rarely changes by more than a factor of two.

Visualization

How to interpret topics using pyldaviz: Let’s interpret the topic visualization. Notice how topics are shown on the left while words are on the right. Here are the main things you should consider:
1. Larger topics are more frequent in the corpus.
2. Topics closer together are more similar, topics further apart are less similar.
3. When you select a topic, you can see the most representative words for the selected topic. This measure can be a combination of how frequent or how discriminant the word is. You can adjust the weight of each property using the slider.
4. Hovering over a word will adjust the topic sizes according to how representative the word is for the topic.
5. ****pyLDAviz paper***!
6. pyLDAviz - what am i looking at ? by spacy. There are a lot of moving parts in the visualization. Here's a brief summary:
  1. On the left, there is a plot of the "distance" between all of the topics (labeled as the Intertopic Distance Map)
  2. The plot is rendered in two dimensions according a multidimensional scaling (MDS) algorithm. Topics that are generally similar should be appear close together on the plot, while dissimilar topics should appear far apart.
  3. The relative size of a topic's circle in the plot corresponds to the relative frequency of the topic in the corpus.
  4. An individual topic may be selected for closer scrutiny by clicking on its circle, or entering its number in the "selected topic" box in the upper-left.
  5. On the right, there is a bar chart showing top terms.
  6. When no topic is selected in the plot on the left, the bar chart shows the top-30 most "salient" terms in the corpus. A term's saliency is a measure of both how frequent the term is in the corpus and how "distinctive" it is in distinguishing between different topics.
  7. When a particular topic is selected, the bar chart changes to show the top-30 most "relevant" terms for the selected topic. The relevance metric is controlled by the parameter λλ, which can be adjusted with a slider above the bar chart.
    Setting the λλ parameter close to 1.0 (the default) will rank the terms solely according to their probability within the topic.
    Setting λλ close to 0.0 will rank the terms solely according to their "distinctiveness" or "exclusivity" within the topic — i.e., terms that occur only in this topic, and do not occur in other topics.
    Setting λλ to values between 0.0 and 1.0 will result in an intermediate ranking, weighting term probability and exclusivity accordingly.
    Rolling the mouse over a term in the bar chart on the right will cause the topic circles to resize in the plot on the left, to show the strength of the relationship between the topics and the selected term.
7. A more detailed explanation of the pyLDAvis visualization can be found here. Unfortunately, though the data used by gensim and pyLDAvis are the same, they don't use the same ID numbers for topics. If you need to match up topics in gensim's LdaMulticore object and pyLDAvis' visualization, you have to dig through the terms manually.
8. Youtube on LDAvis explained
9. Presentation: More visualization options including ldavis
10. A pointer to the ldaviz fix -> fix, git code

COHERENCE (Topic)

Umass vs C_v, what are the diff?
Paper: umass, uci, nmpi, cv, cp etv Exploring the Space of Topic Coherence Measures
Paper: Automatic evaluation of topic coherence
Paper: exploring the space of topic coherence methods
Paper: Relation between mutial information / entropy and pmi
Stackexchange: coherence / pmi how to calc
Paper: Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality - perplexity needs unseen data, coherence doesnt
Evaluation of topic modelling techniques for twitter lda lda-u btm w2vgmm
Paper: Topic coherence measures
topic modelling from different domains
Paper: Optimizing Semantic Coherence in Topic Models
Paper: L-EnsNMF: Boosted Local Topic Discovery via Ensemble of Nonnegative Matrix Factorization
Paper: Content matching between TV shows and advertisements through Latent Dirichlet Allocation
Paper: Full-Text or Abstract? Examining Topic Coherence Scores Using Latent Dirichlet Allocation
Paper: Evaluating topic coherence - Abstract: Topic models extract representative word sets—called topics—from word counts in documents without requiring any semantic annotations. Topics are not guaranteed to be well interpretable, therefore, coherence measures have been proposed to distinguish between good and bad topics. Studies of topic coherence so far are limited to measures that score pairs of individual words. For the first time, we include coherence measures from scientific philosophy that score pairs of more complex word subsets and apply them to topic scoring.

Conclusion: The results of the first experiment show that if we are using the one-any, any-any and one-all coherences directly for optimization they are leading to meaningful word sets. The second experiment shows that these coherence measures are able to outperform the UCI coherence as well as the UMass coherence on these generated word sets. For evaluating LDA topics any-any and one-any coherences perform slightly better than the UCI coherence. The correlation of the UMass coherence and the human ratings is not as high as for the other coherences.