{"id":22089913,"url":"https://github.com/dhchenx/tm-eval","last_synced_at":"2025-03-23T22:49:49.501Z","repository":{"id":62591170,"uuid":"502704788","full_name":"dhchenx/tm-eval","owner":"dhchenx","description":"A toolkit to quickly evaluate model goodness over number of topics","archived":false,"fork":false,"pushed_at":"2022-06-12T19:09:32.000Z","size":76,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-03T13:39:49.170Z","etag":null,"topics":["topic-modeling-analysis"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dhchenx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-06-12T19:07:44.000Z","updated_at":"2022-06-12T19:09:58.000Z","dependencies_parsed_at":"2022-11-04T08:18:13.534Z","dependency_job_id":null,"html_url":"https://github.com/dhchenx/tm-eval","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dhchenx%2Ftm-eval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dhchenx%2Ftm-eval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dhchenx%2Ftm-eval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dhchenx%2Ftm-eval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dhchenx","download_url":"https://codeload.github.com/dhchenx/tm-eval/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245181549,"owners_count":20573718,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["topic-modeling-analysis"],"created_at":"2024-12-01T02:14:42.070Z","updated_at":"2025-03-23T22:49:49.482Z","avatar_url":"https://github.com/dhchenx.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Topic Modeling Evaluation\nA toolkit to quickly evaluate model goodness over number of topics\n\n### Metrics\nCoherence measure to be used. \n\n- Fastest method - 'u_mass', 'c_uci' also known as `c_pmi`. \n\n- For 'u_mass' corpus should be provided, if texts is provided, it will be converted to corpus using the dictionary. \n\n- For 'c_v', 'c_uci' and 'c_npmi' `texts` should be provided (`corpus` isn't needed)\n\n### Examples\n\nExample 1: estimate metrics for one topic model with specific number of topics\n```python\nfrom tm_eval import *\n# load a dictionary with document key and its term list split by ','.\ninput_file = \"datasets/covid19_symptoms.pickle\"\noutput_folder = \"outputs\"\nmodel_name = \"symptom\"\nnum_topics = 10\n# run\nresults = evaluate_all_metrics_from_lda_model(input_file=input_file, \n                                              output_folder=output_folder,\n                                              model_name=model_name, \n                                              num_topics=num_topics)\nprint(results)\n```\nExample 2: find model goodness change over number of topics\n```python\nfrom tm_eval import *\nif __name__==\"__main__\":\n    # start configure\n    # load a dictionary (key,value) with document id as key and its term list combined by ',' as value.\n    input_file = \"datasets/covid19_symptoms.pickle\"\n    output_folder = \"outputs\"\n    model_name = \"symptom\"\n    start=2\n    end=5\n    # end configure\n    # run and explore\n\n    list_results = explore_topic_model_metrics(input_file=input_file, \n                                               output_folder=output_folder,\n                                               model_name=model_name,\n                                               start=start,\n                                               end=end)\n    # summarize results\n    show_topic_model_metric_change(list_results,save=True,\n                                   save_path=f\"{output_folder}/metrics.csv\")\n\n    # plot metric changes\n    plot_tm_metric_change(csv_path=f\"{output_folder}/metrics.csv\",\n                          save=True,save_folder=output_folder)\n```\n\n### Output results\n\n![c_v](https://dhchenx.github.io/projects/tm-eval/c_v.jpg)\n\n![u_mass](https://dhchenx.github.io/projects/tm-eval/u_mass.jpg)\n\n![c_npmi](https://dhchenx.github.io/projects/tm-eval/c_npmi.jpg)\n\n![c_uci](https://dhchenx.github.io/projects/tm-eval/c_uci.jpg)\n\n### License\n\nThe `tm-eval` toolkit is provided by [Donghua Chen](https://github.com/dhchenx) with MIT License.\n\n### References\n1. [Topic Modeling in Python: Latent Dirichlet Allocation (LDA)](https://towardsdatascience.com/end-to-end-topic-modeling-in-python-latent-dirichlet-allocation-lda-35ce4ed6b3e0)\n2. [Evaluate Topic Models: Latent Dirichlet Allocation (LDA)](https://towardsdatascience.com/evaluate-topic-model-in-python-latent-dirichlet-allocation-lda-7d57484bb5d0)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdhchenx%2Ftm-eval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdhchenx%2Ftm-eval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdhchenx%2Ftm-eval/lists"}