https://github.com/zuevmaxim/itmo-ibd
A tool for suggesting topics related to the project on Github based on packages used in the project
https://github.com/zuevmaxim/itmo-ibd
big-data python python-notebook
Last synced: 3 months ago
JSON representation
A tool for suggesting topics related to the project on Github based on packages used in the project
- Host: GitHub
- URL: https://github.com/zuevmaxim/itmo-ibd
- Owner: zuevmaxim
- License: mit
- Created: 2022-05-07T06:57:34.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2022-06-27T16:39:27.000Z (almost 3 years ago)
- Last Synced: 2025-02-10T05:13:23.728Z (5 months ago)
- Topics: big-data, python, python-notebook
- Language: Jupyter Notebook
- Homepage:
- Size: 41.1 MB
- Stars: 0
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Github Topic Suggester
[](https://github.com/zuevmaxim/itmo-ibd/blob/master/LICENSE)A tool for suggesting topics related to the project on Github based on packages used in the project.
The tool uses [Lupa](https://github.com/JetBrains-Research/Lupa) analyzer for extract information about the packages used in the project and
supports only Python and Kotlin project for now.# Demo
This is a demo of the Github Topic Suggester.The user enters the owner and name of the repository on Github and clicks "Suggest".
After a few minutes of waiting, he gets the recommended topics for his project!Take a look on it!
[//]: # (Insert demo video)
You can run the demo yourself using the instructions [here](https://github.com/zuevmaxim/itmo-ibd/tree/master/app).
# Pipeline suggesting topics
The pipeline for processing a new project and suggesting topics for it is as follows.
1. Clone repository from Github
2. Apply [Lupa](https://github.com/JetBrains-Research/Lupa) analyser
for extracting package imports from the project
3. Made some processing
4. Predict relative topics
5. Save suggested topics to fileYou can find more information about the pipeline [here](https://github.com/zuevmaxim/itmo-ibd/tree/master/pipeline).
# Used technologies
* Docker - runs [Lupa](https://github.com/JetBrains-Research/Lupa) and pipeline
* Spark - data processing and pipeline processing
* XGBoost - builds topic predictor
* Flask - builds demonstration app
* Celery - runs the docker container with pipeline on a separate worker# Team
* Dmitry Pogrebnoy
* Maria Tigina
* Maxim Zuev
* Ksenia Razheva