https://github.com/src-d/tmsc
Last synced: 8 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/src-d/tmsc
- Owner: src-d
- License: other
- Created: 2017-09-18T10:04:54.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2019-06-03T12:08:18.000Z (over 6 years ago)
- Last Synced: 2025-05-05T05:04:53.941Z (8 months ago)
- Language: Python
- Size: 85.9 KB
- Stars: 22
- Watchers: 10
- Forks: 9
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# TMSC [](https://travis-ci.org/src-d/tmsc) [](https://codecov.io/gh/src-d/tmsc) [](https://hub.docker.com/r/srcd/tmsc) [](https://pypi.python.org/pypi/tmsc)
TMSC (Topics Modeling on Source Code) is a command line application to discover the topics of
a repository the user provides. A "topic" is a set of keywords, in this case source code
identifiers, which typically occur together. This project has **nothing** to do with
[GitHub topics](https://github.com/blog/2309-introducing-topics).
```
$ tmsc https://github.com/apache/spark
...
Parallel and distributed processing - General IT 4.43
Machine Learning, sklearn-like APIs - General IT 3.87
Java/JS + async + JSON serialization - General IT 3.58
Java string input/output - Programming languages 3.29
Cryptography: libraries - General IT 3.23
SQL, working with databases - General IT 3.11
Java: Spring, Hibernate - Technologies 3.09
Operations on numbers - General IT 2.98
Distributed clusters - General IT 2.62
Functional programming, Scala - Programming languages 2.60
```
Automatic topic inference can be useful for cataloging repositories or mining concepts from them.
The current model was trained on GitHub repositories cloned in October 2016 after
[de-fuzzy-forking](https://blog.sourced.tech/post/minhashcuda/). There is a
[paper](https://arxiv.org/abs/1704.00135) on it.
### Installation
```
pip3 install tmsc
```
### Usage
Command line:
```
$ tmsc https://github.com/apache/spark
```
Python API:
```python
import tmsc
engine = tmsc.Topics()
print(engine.query("https://github.com/apache/spark"))
```
### Docker image
```
docker build -t srcd/tmsc
docker run -d --privileged -p 9432:9432 --name bblfshd bblfsh/bblfshd
docker exec -it bblfshd bblfshctl driver install --recommended
docker run -it --rm srcd/tmsc https://github.com/apache/spark
```
In order to cache the downloaded models:
```
docker run -it --rm -v /path/to/cache/on/host:/root srcd/tmsc https://github.com/apache/spark
```
### Contributions
...are welcome! See [CONTRIBUTING](CONTRIBUTING.md) and [code of conduct](CODE_OF_CONDUCT.md).
### License
[Apache 2.0](LICENSE.md)