{"id":27881771,"url":"https://github.com/src-d/tmsc","last_synced_at":"2025-05-05T05:04:59.210Z","repository":{"id":57475984,"uuid":"103923198","full_name":"src-d/tmsc","owner":"src-d","description":null,"archived":false,"fork":false,"pushed_at":"2019-06-03T12:08:18.000Z","size":88,"stargazers_count":22,"open_issues_count":6,"forks_count":9,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-05-05T05:04:53.941Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/src-d.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-09-18T10:04:54.000Z","updated_at":"2024-12-17T17:58:36.000Z","dependencies_parsed_at":"2022-09-07T14:10:12.090Z","dependency_job_id":null,"html_url":"https://github.com/src-d/tmsc","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Ftmsc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Ftmsc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Ftmsc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Ftmsc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/src-d","download_url":"https://codeload.github.com/src-d/tmsc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252442484,"owners_count":21748451,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-05T05:04:58.532Z","updated_at":"2025-05-05T05:04:59.204Z","avatar_url":"https://github.com/src-d.png","language":"Python","readme":"# TMSC [![Build Status](https://travis-ci.org/src-d/tmsc.svg)](https://travis-ci.org/src-d/tmsc) [![codecov](https://codecov.io/github/src-d/tmsc/coverage.svg?branch=develop)](https://codecov.io/gh/src-d/tmsc) [![Docker Build Status](https://img.shields.io/docker/build/srcd/tmsc.svg)](https://hub.docker.com/r/srcd/tmsc) [![PyPI](https://img.shields.io/pypi/v/tmsc.svg)](https://pypi.python.org/pypi/tmsc)\n\nTMSC (Topics Modeling on Source Code) is a command line application to discover the topics of\na repository the user provides. A \"topic\" is a set of keywords, in this case source code\nidentifiers, which typically occur together. This project has **nothing** to do with\n[GitHub topics](https://github.com/blog/2309-introducing-topics).\n\n```\n$ tmsc https://github.com/apache/spark\n...\n                Parallel and distributed processing - General IT\t4.43\n                Machine Learning, sklearn-like APIs - General IT\t3.87\n               Java/JS + async + JSON serialization - General IT\t3.58\n                Java string input/output - Programming languages\t3.29\n                            Cryptography: libraries - General IT\t3.23\n                        SQL, working with databases - General IT\t3.11\n                          Java: Spring, Hibernate - Technologies\t3.09\n                              Operations on numbers - General IT\t2.98\n                               Distributed clusters - General IT\t2.62\n           Functional programming, Scala - Programming languages\t2.60\n```\n\nAutomatic topic inference can be useful for cataloging repositories or mining concepts from them.\nThe current model was trained on GitHub repositories cloned in October 2016 after\n[de-fuzzy-forking](https://blog.sourced.tech/post/minhashcuda/). There is a\n[paper](https://arxiv.org/abs/1704.00135) on it.\n\n### Installation\n\n```\npip3 install tmsc\n```\n\n### Usage\n\nCommand line:\n\n```\n$ tmsc https://github.com/apache/spark\n```\n\nPython API:\n\n```python\nimport tmsc\n\nengine = tmsc.Topics()\nprint(engine.query(\"https://github.com/apache/spark\"))\n```\n\n### Docker image\n\n```\ndocker build -t srcd/tmsc\ndocker run -d --privileged -p 9432:9432 --name bblfshd bblfsh/bblfshd\ndocker exec -it bblfshd bblfshctl driver install --recommended\ndocker run -it --rm srcd/tmsc https://github.com/apache/spark\n```\n\nIn order to cache the downloaded models:\n\n```\ndocker run -it --rm -v /path/to/cache/on/host:/root srcd/tmsc https://github.com/apache/spark\n```\n\n### Contributions\n\n...are welcome! See [CONTRIBUTING](CONTRIBUTING.md) and [code of conduct](CODE_OF_CONDUCT.md).\n\n### License\n\n[Apache 2.0](LICENSE.md)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrc-d%2Ftmsc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsrc-d%2Ftmsc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrc-d%2Ftmsc/lists"}