Thursday, December 12, 2013

Difference between Topic Modeling and Document Clustering

Topic modeling is one way of implementing clustering for a document collection. In this article, by the term "clustering" I mean a popular clustering mechanism such as K-means, fuzzy K-means etc.

So, the difference is the way how these both mechanisms have been implemented. Even though both of them  returns similar type of outcome, the actual data/ knowledge embedded in the outcome can be different.

In topic modeling, each document is represented as a distribution of topics. And essentially, topic is a probability distribution over words. As opposed to topic modelling, in document clustering, cluster is composed of collection of documents. (not topics)

.. to be continued!



No comments:

Post a Comment