Skip to content

Latest commit

 

History

History
13 lines (10 loc) · 330 Bytes

File metadata and controls

13 lines (10 loc) · 330 Bytes

topic-detection

detect topics in text using clustering or LDA

Feature Extraction (for clustering)

  • TFIDF
  • Tensorflow Hub Universal Sentence Encoder ('TFHUB')

Models

  • clustering
    • kmeans (+ elbow method for determining optimal k)
    • hierarchical clustering (+ cosine distance)
  • LDA (codes cr. Anindya Roy)