https://github.com/namgyu-youn/topicgen
Analyze GitHub repository, then generate related topics('#').
https://github.com/namgyu-youn/topicgen
gradio gradio-ui huggingface-transformers zero-shot
Last synced: 7 months ago
JSON representation
Analyze GitHub repository, then generate related topics('#').
- Host: GitHub
- URL: https://github.com/namgyu-youn/topicgen
- Owner: Namgyu-Youn
- License: mit
- Created: 2025-01-01T14:52:38.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-03-02T16:20:45.000Z (7 months ago)
- Last Synced: 2025-03-02T17:27:02.577Z (7 months ago)
- Topics: gradio, gradio-ui, huggingface-transformers, zero-shot
- Language: Python
- Homepage: https://huggingface.co/spaces/Namgyu-Youn/topic-generator
- Size: 557 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# topicgen
This tool aimes to predict relevant **GitHub topics** for repositories by analyzing their content. It collects repository data via the `GitHub API`, processes descriptive text and `README` files, and utilizes a `BERT-based multi-label classifier` to suggest appropriate topics. The system includes complete data collection and model training pipelines, with support for exporting trained models to `ONNX format` for deployment.## Project Structure
### Project Overview
### Data Collection Pipeline - [Sample Database](https://github.com/Namgyu-Youn/topicgen/blob/main/data/topicgen.db)
### Model Training Pipeline
## ✨ Features
- **Collects GitHub repository data** (metadata, topics, READMEs) via GitHub API
- **Analyzes repository content** to predict relevant topics using ML models
- Trains a BERT-based **multi-label classifier** for topic prediction
- Stores repository and topic data in SQLite for efficient retrieval
- Exports trained models to ONNX format for production deployment## 🚩 How to use?
```bash
git clone https://github.com/Namgyu-Youn/topicgen.git
cd topicgen
```### Option 1: Using Poetry (Highly Recommended)
```bash
curl -sSL https://install.python-poetry.org | python3 - # Optional
poetry install# Data Collection Pipeline
poetry run python -m topicgen.pipelines.data_collection_pipeline --min-stars 1000 --language python --max-repos 500# Model Training Pipeline
poetry run python -m topicgen.pipelines.model_training_pipeline --base-model bert-base-uncased --num-epochs 5
```### Option 2: Using Docker
```bash
# Build the Docker image
docker build -t github-topic-generator .# Run data collection pipeline
docker run github-topic-generator python -m topicgen.pipelines.data_collection_pipeline# Run model training pipeline
docker run github-topic-generator python -m topicgen.pipelines.model_training_pipeline
```### Option 3: Standard Python Setup
```bash
python -m venv env# On Windows
env\Scripts\activate
# On macOS/Linux
source env/bin/activatepip install -r requirements.txt
# Data Collection Pipeline
python -m topicgen.pipelines.data_collection_pipeline# Model Training Pipeline
python -m topicgen.pipelines.model_training_pipeline
```## 🧐 Introduction about gradio UI
1. Enter GitHub URL
2. Select the main, sub category that best matches your repository
3. Click "Generate Topics" to get your results
4. Enjoy generated topics('#')! It can be used like this.## 👥 Contribution guide : [CONTRIBUTING.md](https://github.com/Namgyu-Youn/github-topic-generator/blob/main/CONTRIBUTING.md)
Thanks for your interest. I always enjoy meaningful collaboration.
Do you have any question or bug?? Then submit **ISSUE**. You can also use awesome labels(🏷️).