Dev 007: Difference between Topic Modeling and Document Clustering

Thursday, December 12, 2013

Difference between Topic Modeling and Document Clustering

Topic modeling is one way of implementing clustering for a document collection. In this article, by the term "clustering" I mean a popular clustering mechanism such as K-means, fuzzy K-means etc.

So, the difference is the way how these both mechanisms have been implemented. Even though both of them returns similar type of outcome, the actual data/ knowledge embedded in the outcome can be different.

In topic modeling, each document is represented as a distribution of topics. And essentially, topic is a probability distribution over words. As opposed to topic modelling, in document clustering, cluster is composed of collection of documents. (not topics)

.. to be continued!

Dev 007

Pages

Thursday, December 12, 2013

Difference between Topic Modeling and Document Clustering

No comments:

Post a Comment