https://github.com/agora-lab-ai/pytorch-dataset
A PyTorch Code Dataset for Cutting-Edge Fine-tuning
https://github.com/agora-lab-ai/pytorch-dataset
Last synced: 2 months ago
JSON representation
A PyTorch Code Dataset for Cutting-Edge Fine-tuning
- Host: GitHub
- URL: https://github.com/agora-lab-ai/pytorch-dataset
- Owner: Agora-Lab-AI
- License: mit
- Created: 2023-09-06T18:17:33.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-11T03:47:10.000Z (over 1 year ago)
- Last Synced: 2025-04-04T03:11:19.646Z (3 months ago)
- Language: Python
- Size: 202 MB
- Stars: 4
- Watchers: 0
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Contributing: docs/contributing.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[](https://discord.gg/qUtxnK2NMf)
# Pytorch-Dataset
A PyTorch Code Dataset for Cutting-Edge Fine-tuning## Installation
You can install the package using pip```bash
pip install pytorch-dataset
```# Usage
Downloader that downloads and unzips each repository in an account
```pythonfrom pytorch import GitHubRepoDownloader
# Example usage:
downloader = GitHubRepoDownloader(username="lucidrains", download_dir="lucidrains_repositories")
downloader.download_repositories()
```Processor that cleans, formats, and submits the cleaned dataset to huggingface
```python
from pytorch import CodeDatasetBuilder# Example usage:
code_builder = CodeDatasetBuilder("lucidrains_repositories")code_builder.save_dataset(
"lucidrains_python_code_dataset",
exclude_files=["setup.py"], exclude_dirs=["tests"]
)code_builder.push_to_hub("lucidrains_python_code_dataset", organization="kye")
```
# License
MIT