https://github.com/artkulak/repo2file

Dump selected files from your repo into single file to easily use in LLMs (Claude, Openai, etc..)
https://github.com/artkulak/repo2file

Last synced: 5 months ago
JSON representation

Dump selected files from your repo into single file to easily use in LLMs (Claude, Openai, etc..)

Host: GitHub
URL: https://github.com/artkulak/repo2file
Owner: artkulak
Created: 2024-09-08T06:49:13.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-09-08T07:27:55.000Z (about 1 year ago)
Last Synced: 2024-09-08T09:39:53.603Z (about 1 year ago)
Language: Python
Size: 5.86 KB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

jimsghstars - artkulak/repo2file - Dump selected files from your repo into single file to easily use in LLMs (Claude, Openai, etc..) (Python)

README

# Repository Content Dumper for LLM Prompts

## Overview

This tool is designed to dump the contents of a Git repository into a single file, making it easier to use in Retrieval-Augmented Generation (RAG) systems or as part of prompts for Large Language Models (LLMs). By consolidating your codebase into one file, you can more easily pass context to an LLM or integrate it into a RAG pipeline.

## Features

- Dumps entire repository content into a single file
- Respects .gitignore patterns to exclude unnecessary files
- Generates a tree-like directory structure for easy navigation
- Includes file contents for all non-excluded files
- Customizable file type filtering

## Use Cases

1. **RAG Systems**: Use the dumped content as a knowledge base for retrieval-augmented generation, allowing LLMs to access and reference your codebase accurately.

2. **LLM Prompts**: Include relevant parts of your codebase in prompts to give LLMs more context about your project structure and implementation details.

3. **Code Analysis**: Quickly get an overview of your entire project in a single file, making it easier to analyze or search through your codebase.

4. **Documentation**: Generate comprehensive documentation that includes both the structure and content of your project.

## Usage

```
python dump.py [exclusion_file] [file_extensions...]
```

- ``: The root directory of your repository
- ``: The file where the dumped content will be saved
- `[exclusion_file]`: Optional. A file containing exclusion patterns (e.g., .gitignore)
- `[file_extensions...]`: Optional. Specific file extensions to include (e.g., .py .js .tsx)

Example:
```
python dump.py /path/to/your/repo output.txt .gitignore py js tsx
```

## Output Format

The output file will contain:

1. A tree-like representation of your directory structure
2. The contents of each included file, preceded by its relative path

Example:
```
Directory Structure:
-------------------
/
├── .env.local
├── package.json
├── next.config.js
├── tsconfig.json
├── public/
│ └── images/
│ ├── astro.png
│ └── astro-logo.svg
├── src/
│ ├── app/
│ │ ├── layout.tsx
│ │ ├── page.tsx
│ │ └── tools/
...

File Contents:
--------------
File: .env.local
--------------------------------------------------
Content of .env.local:
API_KEY=your_api_key_here
...

File: package.json
--------------------------------------------------
Content of package.json:
{
"name": "your-project",
"version": "1.0.0",
...
}

...
```

## Benefits for LLM Integration

1. **Contextual Understanding**: By providing the entire codebase structure and content, LLMs can better understand the context of your project.

2. **Improved Code Generation**: LLMs can generate more accurate and context-aware code suggestions when they have access to your full project structure.

3. **Enhanced Debugging**: When asking LLMs for help with debugging, providing the full context allows for more precise problem identification and solution suggestions.

4. **Architecture Analysis**: LLMs can provide insights on your project's architecture and suggest improvements when they can see the entire structure.

5. **Documentation Generation**: Use the dumped content to ask LLMs to generate or improve project documentation.

## Best Practices

1. Be mindful of sensitive information. Use .gitignore or the exclusion file to omit sensitive data.
2. For large repositories, consider dumping only relevant sections to stay within LLM token limits.
3. When using the dumped content in LLM prompts, clearly specify which parts of the codebase are relevant to your question or task.

## Contributing

Contributions to improve this tool are welcome! Please submit issues or pull requests on our GitHub repository.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/artkulak/repo2file

Awesome Lists containing this project

README