Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/blaisewf/repo2txt
A tool to clone GitHub repositories, document their directory structure, and extract file contents into a text file.
https://github.com/blaisewf/repo2txt
ai dataset dataset-generation llm ml scraper scraping tool
Last synced: about 2 months ago
JSON representation
A tool to clone GitHub repositories, document their directory structure, and extract file contents into a text file.
- Host: GitHub
- URL: https://github.com/blaisewf/repo2txt
- Owner: blaisewf
- License: mit
- Created: 2024-07-30T22:54:18.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-08T21:01:47.000Z (5 months ago)
- Last Synced: 2024-08-08T23:12:41.998Z (5 months ago)
- Topics: ai, dataset, dataset-generation, llm, ml, scraper, scraping, tool
- Language: Python
- Homepage: https://huggingface.co/spaces/blaise-tk/repo2txt
- Size: 19.5 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# repo2txt
`repo2txt` is a Python package that clones a GitHub repository, generates a text file containing the repository's directory structure and the contents of all its files, and handles cleanup.
## Installation
You can install `repo2txt` using pip:
```sh
pip install git+https://github.com/blaisewf/repo2txt.git
```Alternatively, you can clone the repository and install it locally:
```sh
git clone https://github.com/blaisewf/repo2txt.git
cd repo2txt
pip install .
```## Usage
Once installed, you can use the CLI command `repo2txt` to process a GitHub repository. Here’s the basic syntax:
```sh
repo2txt --repo-url --output-file --branch --config --local-path
```### Example
From GitHub:
```sh
repo2txt --repo-url https://github.com/example/repository.git --output-file output.txt --branch develop --config repo2txt/configs/config.json
```Using a local folder:
```sh
repo2txt --local-path downloads/my-project --output-file output.txt --config repo2txt/configs/config.json
```This command will:
1. Clone the repository from `https://github.com/example/repository.git`.
2. Generate a text file `output.txt` containing the directory structure and contents of all files in the repository.
3. Clean up the cloned repository directory.### Configuration
In the config file you can specify which files you want to ignore when generating the text file. The config file should be a JSON file with the following structure:
```json
{
"ignore": ["*.md", "*.log", "node_modules", ".git"]
}
```## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
## References
- https://github.com/kirill-markin/repo-to-text