{"id":25046530,"url":"https://github.com/praatibhsurana/unsupervised-topic-modeling","last_synced_at":"2025-08-08T01:16:51.532Z","repository":{"id":217198942,"uuid":"389569670","full_name":"praatibhsurana/Unsupervised-Topic-Modeling","owner":"praatibhsurana","description":"A project on unsupervised topic modeling of garage reviews. ","archived":false,"fork":false,"pushed_at":"2021-07-26T19:54:38.000Z","size":1099,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-06T06:52:09.522Z","etag":null,"topics":["gensim","latent-dirichlet-allocation","nlp","nlp-classification-model","pyldavis","topic-modeling","unsupervised-clustering"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/praatibhsurana.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-07-26T08:54:50.000Z","updated_at":"2022-05-10T10:38:01.000Z","dependencies_parsed_at":"2024-01-15T01:38:41.713Z","dependency_job_id":null,"html_url":"https://github.com/praatibhsurana/Unsupervised-Topic-Modeling","commit_stats":null,"previous_names":["praatibhsurana/unsupervised-topic-modeling"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/praatibhsurana%2FUnsupervised-Topic-Modeling","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/praatibhsurana%2FUnsupervised-Topic-Modeling/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/praatibhsurana%2FUnsupervised-Topic-Modeling/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/praatibhsurana%2FUnsupervised-Topic-Modeling/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/praatibhsurana","download_url":"https://codeload.github.com/praatibhsurana/Unsupervised-Topic-Modeling/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246403897,"owners_count":20771526,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gensim","latent-dirichlet-allocation","nlp","nlp-classification-model","pyldavis","topic-modeling","unsupervised-clustering"],"created_at":"2025-02-06T06:51:53.685Z","updated_at":"2025-03-31T01:48:00.564Z","avatar_url":"https://github.com/praatibhsurana.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Unsupervised-Topic-Modeling\n\n### Data Analysis\nDataset : [Click here to view the dataset](https://github.com/praatibhsurana/Unsupervised-Topic-Modeling/blob/main/data.csv) \\\nThe dataset provided consists of roughly 10,000 unlabeled garage reviews. You can check out the [Data Analysis notebook](https://github.com/praatibhsurana/Unsupervised-Topic-Modeling/blob/main/Data%20Analysis_1.ipynb) for a basic analysis of the data.\n\n### Approach\nThe dataset provided has unlabeled data. The task at hand is to classify this data into different topics. Now the first thing that needs to be done is to segregate the data into various topic clusters. This is an unsupervised topic modeling problem and we will make use of the Latent Dirichlet Allocation (LDA) algorithm to generate topic clusters for our data. \\\nThe LDA algorithm has been implemented in the [LDA Model notebook](https://github.com/praatibhsurana/Unsupervised-Topic-Modeling/blob/main/LDA_Model_2.ipynb).\n\n### Predictions \nPredictions were made by carefully mapping the topics from the evaluation_labels.MD file to the LDA generated topics. The [Prediction notebook](https://github.com/praatibhsurana/Unsupervised-Topic-Modeling/blob/main/Predictions_3.ipynb) consists of a brief explanation regarding the prediction and ideas employed.\n\n### Setup\n**Clone the repo and navigate to it**\n```bash\ngit clone https://github.com/praatibhsurana/Unsupervised-Topic-Modeling.git\ncd Unsupervised-Topic-Modeling\n```\n\n### Installing requirements\n**Install all requirements using pip and open up Jupyter notebook**\n```bash\npip install -r requirements.txt\njupyter notebook\n```\n\n### References\n1) https://towardsdatascience.com/evaluate-topic-model-in-python-latent-dirichlet-allocation-lda-7d57484bb5d0 \n2) https://towardsdatascience.com/end-to-end-topic-modeling-in-python-latent-dirichlet-allocation-lda-35ce4ed6b3e0 \n3) https://www.tutorialspoint.com/gensim/gensim_creating_lda_topic_model.htm#:~:text=Role%20of%20LDA\u0026text=Every%20topic%20is%20modeled%20as,from%20a%20mixture%20of%20topics. \n4) https://radimrehurek.com/gensim/models/ldamodel.html \n5) https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation \n6) https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/ \n7) https://towardsdatascience.com/lda-topic-modeling-an-explanation-e184c90aadcd \n8) https://towardsdatascience.com/latent-dirichlet-allocation-lda-9d1cd064ffa2 \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpraatibhsurana%2Funsupervised-topic-modeling","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpraatibhsurana%2Funsupervised-topic-modeling","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpraatibhsurana%2Funsupervised-topic-modeling/lists"}