{"id":19162759,"url":"https://github.com/centre-for-humanities-computing/tweetopic","last_synced_at":"2025-10-10T15:16:37.576Z","repository":{"id":58637553,"uuid":"530571734","full_name":"centre-for-humanities-computing/tweetopic","owner":"centre-for-humanities-computing","description":"Blazing fast topic modelling for short texts.","archived":false,"fork":false,"pushed_at":"2025-10-06T17:58:59.000Z","size":2302,"stargazers_count":33,"open_issues_count":9,"forks_count":4,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-10T15:16:37.004Z","etag":null,"topics":["dirichlet-process-mixtures","dmm","gibbs-sampling","gsdmm","machine-learning","mcmc","nlp","python","scikit-learn","topic-modeling","tweet","tweet-analysis","visualization"],"latest_commit_sha":null,"homepage":"https://centre-for-humanities-computing.github.io/tweetopic/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/centre-for-humanities-computing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"citation.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-08-30T08:47:33.000Z","updated_at":"2025-08-04T11:00:54.000Z","dependencies_parsed_at":"2024-04-11T03:00:15.942Z","dependency_job_id":"553da600-2f90-41e3-b1ee-6ac217cc57a4","html_url":"https://github.com/centre-for-humanities-computing/tweetopic","commit_stats":{"total_commits":93,"total_committers":3,"mean_commits":31.0,"dds":"0.18279569892473113","last_synced_commit":"0d7d0a5a99d361e81bc748405340eceb85740be8"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/centre-for-humanities-computing/tweetopic","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centre-for-humanities-computing%2Ftweetopic","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centre-for-humanities-computing%2Ftweetopic/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centre-for-humanities-computing%2Ftweetopic/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centre-for-humanities-computing%2Ftweetopic/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/centre-for-humanities-computing","download_url":"https://codeload.github.com/centre-for-humanities-computing/tweetopic/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/centre-for-humanities-computing%2Ftweetopic/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279004577,"owners_count":26083735,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dirichlet-process-mixtures","dmm","gibbs-sampling","gsdmm","machine-learning","mcmc","nlp","python","scikit-learn","topic-modeling","tweet","tweet-analysis","visualization"],"created_at":"2024-11-09T09:13:04.727Z","updated_at":"2025-10-10T15:16:37.557Z","avatar_url":"https://github.com/centre-for-humanities-computing.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg align=\"left\" width=\"82\" height=\"82\" src=\"docs/_static/icon.svg\"\u003e\n\n# tweetopic\n\n:zap: Blazing Fast topic modelling over short texts in Python\n\u003cbr\u003e\n\n[![PyPI version](https://badge.fury.io/py/tweetopic.svg)](https://pypi.org/project/tweetopic/)\n[![pip downloads](https://img.shields.io/pypi/dm/tweetopic.svg)](https://pypi.org/project/tweetopic/)\n[![python version](https://img.shields.io/badge/Python-%3E=3.7-blue)](https://github.com/centre-for-humanities-computing/tweetopic)\n[![Code style: black](https://img.shields.io/badge/Code%20Style-Black-black)](https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html)\n\u003cbr\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"docs/_static/banner.svg\" height=400 align=\"center\"\u003e\n\u003c/p\u003e\n\n\n## Features\n\n- Fast :zap:\n- Scalable :collision:\n- High consistency and coherence :dart:\n- High quality topics :fire:\n- Easy visualization and inspection :eyes:\n- Full scikit-learn compatibility :nut_and_bolt:\n\n#### New in version 0.4.0 ✨\nYou can now pass `random_state` to topic models to make your results reproducible.\n\n```python\nfrom tweetopic import DMM\n\nmodel = DMM(10, random_state=42)\n```\n\n## 🛠 Installation\n\nInstall from PyPI:\n\n```bash\npip install tweetopic\n```\n\n## 👩‍💻 Usage ([documentation](https://centre-for-humanities-computing.github.io/tweetopic/))\n\nTrain your a topic model on a corpus of short texts:\n\n```python\nfrom tweetopic import DMM\nfrom sklearn.feature_extraction.text import CountVectorizer\nfrom sklearn.pipeline import Pipeline\n\n# Creating a vectorizer for extracting document-term matrix from the\n# text corpus.\nvectorizer = CountVectorizer(min_df=15, max_df=0.1)\n\n# Creating a Dirichlet Multinomial Mixture Model with 30 components\ndmm = DMM(n_components=30, n_iterations=100, alpha=0.1, beta=0.1)\n\n# Creating topic pipeline\npipeline = Pipeline([\n    (\"vectorizer\", vectorizer),\n    (\"dmm\", dmm),\n])\n```\n\nYou may fit the model with a stream of short texts:\n\n```python\npipeline.fit(texts)\n```\n\nTo investigate internal structure of topics and their relations to words and indicidual documents we recommend using [topicwizard](https://github.com/x-tabdeveloping/topic-wizard).\n\nInstall it from PyPI:\n\n```bash\npip install topic-wizard\n```\n\nThen visualize your topic model:\n\n```python\nimport topicwizard\n\ntopicwizard.visualize(pipeline=pipeline, corpus=texts)\n```\n\n![topicwizard visualization](docs/_static/topicwizard.png)\n\n## 🎓 References\n\n- Yin, J., \u0026 Wang, J. (2014). A Dirichlet Multinomial Mixture Model-Based Approach for Short Text Clustering. _In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 233–242). Association for Computing Machinery._\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcentre-for-humanities-computing%2Ftweetopic","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcentre-for-humanities-computing%2Ftweetopic","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcentre-for-humanities-computing%2Ftweetopic/lists"}