Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/artkulak/repo2file
Dump selected files from your repo into single file to easily use in LLMs (Claude, Openai, etc..)
https://github.com/artkulak/repo2file
Last synced: 22 days ago
JSON representation
Dump selected files from your repo into single file to easily use in LLMs (Claude, Openai, etc..)
- Host: GitHub
- URL: https://github.com/artkulak/repo2file
- Owner: artkulak
- Created: 2024-09-08T06:49:13.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-09-08T07:27:55.000Z (3 months ago)
- Last Synced: 2024-09-08T09:39:53.603Z (3 months ago)
- Language: Python
- Size: 5.86 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- jimsghstars - artkulak/repo2file - Dump selected files from your repo into single file to easily use in LLMs (Claude, Openai, etc..) (Python)
README
# Repository Content Dumper for LLM Prompts
## Overview
This tool is designed to dump the contents of a Git repository into a single file, making it easier to use in Retrieval-Augmented Generation (RAG) systems or as part of prompts for Large Language Models (LLMs). By consolidating your codebase into one file, you can more easily pass context to an LLM or integrate it into a RAG pipeline.
## Features
- Dumps entire repository content into a single file
- Respects .gitignore patterns to exclude unnecessary files
- Generates a tree-like directory structure for easy navigation
- Includes file contents for all non-excluded files
- Customizable file type filtering## Use Cases
1. **RAG Systems**: Use the dumped content as a knowledge base for retrieval-augmented generation, allowing LLMs to access and reference your codebase accurately.
2. **LLM Prompts**: Include relevant parts of your codebase in prompts to give LLMs more context about your project structure and implementation details.
3. **Code Analysis**: Quickly get an overview of your entire project in a single file, making it easier to analyze or search through your codebase.
4. **Documentation**: Generate comprehensive documentation that includes both the structure and content of your project.
## Usage
```
python dump.py [exclusion_file] [file_extensions...]
```- ``: The root directory of your repository
- ``: The file where the dumped content will be saved
- `[exclusion_file]`: Optional. A file containing exclusion patterns (e.g., .gitignore)
- `[file_extensions...]`: Optional. Specific file extensions to include (e.g., .py .js .tsx)Example:
```
python dump.py /path/to/your/repo output.txt .gitignore py js tsx
```## Output Format
The output file will contain:
1. A tree-like representation of your directory structure
2. The contents of each included file, preceded by its relative pathExample:
```
Directory Structure:
-------------------
/
├── .env.local
├── package.json
├── next.config.js
├── tsconfig.json
├── public/
│ └── images/
│ ├── astro.png
│ └── astro-logo.svg
├── src/
│ ├── app/
│ │ ├── layout.tsx
│ │ ├── page.tsx
│ │ └── tools/
...File Contents:
--------------
File: .env.local
--------------------------------------------------
Content of .env.local:
API_KEY=your_api_key_here
...File: package.json
--------------------------------------------------
Content of package.json:
{
"name": "your-project",
"version": "1.0.0",
...
}...
```## Benefits for LLM Integration
1. **Contextual Understanding**: By providing the entire codebase structure and content, LLMs can better understand the context of your project.
2. **Improved Code Generation**: LLMs can generate more accurate and context-aware code suggestions when they have access to your full project structure.
3. **Enhanced Debugging**: When asking LLMs for help with debugging, providing the full context allows for more precise problem identification and solution suggestions.
4. **Architecture Analysis**: LLMs can provide insights on your project's architecture and suggest improvements when they can see the entire structure.
5. **Documentation Generation**: Use the dumped content to ask LLMs to generate or improve project documentation.
## Best Practices
1. Be mindful of sensitive information. Use .gitignore or the exclusion file to omit sensitive data.
2. For large repositories, consider dumping only relevant sections to stay within LLM token limits.
3. When using the dumped content in LLM prompts, clearly specify which parts of the codebase are relevant to your question or task.## Contributing
Contributions to improve this tool are welcome! Please submit issues or pull requests on our GitHub repository.