Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ableinc/git2txt
Convert all files in git repository to .txt files. Useful for training LLMs on your codebase.
https://github.com/ableinc/git2txt
git llm machine-learning python3 training-data txt
Last synced: 17 days ago
JSON representation
Convert all files in git repository to .txt files. Useful for training LLMs on your codebase.
- Host: GitHub
- URL: https://github.com/ableinc/git2txt
- Owner: ableinc
- Created: 2023-06-08T00:00:09.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-08T12:37:52.000Z (7 months ago)
- Last Synced: 2024-10-02T20:41:14.577Z (about 1 month ago)
- Topics: git, llm, machine-learning, python3, training-data, txt
- Language: Python
- Homepage:
- Size: 1.95 KB
- Stars: 22
- Watchers: 2
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# git2txt
Convert all files in git repository to .txt files. This is useful for training LLMs on your codebase.
## How to Use
1. Create new .env file by copying example.env
```shell
cp example.env .env
```
2. Add necessary fields. The default fields are good to start with.
```bash
GIT_PROJECT_DIRECTORY=/path/to/git/repo
IGNORE_FILES=.env,package-lock.json
IGNORE_DIRS=.git,.vscode,node_modules
SAVE_DIRECTORY=training_data
SKIP_EMPTY_FILES=true
```
3. Install dependencies. Using a virtual environment is recommended.
```shell
python -m pip install -r requirements.txt
```
4. Run program
```shell
python main.py
```
5. You'll see your data files in the ```training_data/``` directory. This will be different if you changed the path via ```SAVE_DIRECTORY``` in ```.env``` file.## Notes
- This program requires Python version 3.6 or later. It uses the f-string formatting technique introduced in Python 3.6.