https://github.com/stkzlv/git2one
Concatenate Git repositories into single files for AI analysis, documentation, and code review. Supports multiple formats with smart filtering.
https://github.com/stkzlv/git2one
ai cli-tools concatenation git python
Last synced: 26 days ago
JSON representation
Concatenate Git repositories into single files for AI analysis, documentation, and code review. Supports multiple formats with smart filtering.
- Host: GitHub
- URL: https://github.com/stkzlv/git2one
- Owner: stkzlv
- License: mit
- Created: 2025-06-27T21:29:03.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-06-27T22:07:01.000Z (8 months ago)
- Last Synced: 2025-06-27T23:20:03.829Z (8 months ago)
- Topics: ai, cli-tools, concatenation, git, python
- Language: Python
- Homepage:
- Size: 10.7 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# git2one
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
[](https://github.com/stkzlv/git2one/issues)
[](https://github.com/stkzlv/git2one/stargazers)
`git2one` is a Python script that concatenates all text files in a Git repository into a single output file, making it easy to share or analyze the codebase as a single document. It respects `.gitignore` patterns, supports customizable file extensions, and allows including or excluding specific files or directories. This tool is particularly useful for preparing codebases for AI model input, documentation, or analysis.
## Table of Contents
- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
- [Command-Line Options](#command-line-options)
- [Examples](#examples)
- [Configuration](#configuration)
- [Output Formats](#output-formats)
- [Testing](#testing)
- [Troubleshooting](#troubleshooting)
- [Contributing](#contributing)
- [License](#license)
## Features
### Core Functionality
- **Smart File Concatenation**: Combines text files into a single output with clear file separators
- **Gitignore Respect**: Automatically excludes files and directories specified in `.gitignore`
- **Multiple Output Formats**: Supports text, JSON, XML, and Markdown formats
- **Flexible Filtering**: Include/exclude files using glob patterns
- **Comment Stripping**: Optional removal of Python comments and docstrings
### AI/LLM Integration
- **Token Estimation**: Estimates token count for AI model compatibility (GPT, Claude, Gemini)
- **Optimized for AI**: Perfect for preparing codebases for AI analysis and code review
- **Multiple Formats**: JSON and XML formats for programmatic processing
### Developer-Friendly
- **Cross-Platform**: Works on Windows, macOS, and Linux
- **Configurable**: Customizable file extensions and exclusion patterns
- **Safe Processing**: Automatically skips binary files and handles encoding errors
## Use Cases
- **🤖 AI Code Analysis**: Prepare entire codebases for AI models (ChatGPT, Claude, etc.)
- **📚 Documentation**: Generate comprehensive code documentation
- **🔍 Code Review**: Create single-file snapshots for easier review
- **📊 Analysis**: Feed code into analysis tools that expect single files
- **📤 Sharing**: Share entire projects as single, readable documents
- **🏗️ Architecture Review**: Get a bird's-eye view of project structure
## Quick Start
```bash
# Clone the repository
git clone https://github.com/stkzlv/git2one.git
cd git2one
# Install dependencies
pip install -r requirements.txt
# Run on any Git repository
python git2one.py /path/to/your/repository
```
## Requirements
- Python 3.7 or higher
- Git repository (the tool works with any Git repository)
## Installation
### Method 1: Clone from GitHub (Recommended)
1. Clone the repository:
```bash
git clone https://github.com/stkzlv/git2one.git
cd git2one
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
### Method 2: Manual Installation
1. Download the script files
2. Install dependencies manually:
```bash
pip install gitignore-parser pyyaml
```
3. Ensure `config.yaml` is in the same directory as `git2one.py` (see [Configuration](#configuration)).
## Usage
Run the script with the path to the target Git repository:
```bash
python git2one.py /path/to/repository
```
This concatenates all text files into `repo_combined.txt` (default output file).
### Command-Line Options
- `--output `: Specify output file (default: `repo_combined.txt`).
- `--config `: Path to config file (default: `config.yaml`).
- `--include `: Include only files matching these patterns (e.g., `*.py`, `src/*`).
- `--exclude `: Exclude files matching these patterns (e.g., `poetry.lock`, `tests/`).
- `--ignore-gitignore`: Ignore `.gitignore` file.
- `--strip-comments`: Strip Python-style comments (`#`) and docstrings (`"""..."""`).
- `--format `: Output format (`text`, `json`, `xml`, `markdown`). Auto-detected from file extension if not specified.
### Examples
1. Concatenate all text files in a repository:
```bash
python git2one.py ~/github.com/ProjectA/
```
Output: `repo_combined.txt` with files like `README.md`, `src/*.py`, `config/*.yaml`.
2. Include only Python and Markdown files:
```bash
python git2one.py ~/github.com/ProjectA/ --include "*.py" "*.md"
```
3. Exclude large files and strip comments:
```bash
python git2one.py ~/github.com/ProjectA/ --exclude "poetry.lock" --strip-comments
```
4. Use a custom output file and bypass `.gitignore`:
```bash
python git2one.py ~/github.com/ProjectA/ --output combined_code.txt --ignore-gitignore
```
5. Process specific directories:
```bash
python git2one.py ~/github.com/ProjectA/ --include "src/*" "config/*"
```
6. Generate JSON output for programmatic processing:
```bash
python git2one.py ~/github.com/ProjectA/ --output combined.json --format json
```
7. Create a Markdown report with syntax highlighting:
```bash
python git2one.py ~/github.com/ProjectA/ --output report.md --format markdown
```
8. Export as XML for structured data processing:
```bash
python git2one.py ~/github.com/ProjectA/ --output data.xml --format xml
```
## Configuration
The `config.yaml` file controls which files are considered text and which are excluded by default. Example:
```yaml
text_extensions:
- .py
- .md
- .yaml
- .toml
- .lock
default_exclusions:
- "*.exe"
- "*.png"
- ".git/*"
- ".mypy_cache/*"
default_output: repo_combined.txt
token_multiplier: 1.3
```
- `text_extensions`: File extensions considered text (e.g., `.py`, `.md`).
- `default_exclusions`: Patterns to always exclude (e.g., `.git/*`, `*.jpg`).
- `default_output`: Default output file name.
- `token_multiplier`: Multiplier for estimating AI model token count.
## Output Formats
Git2one supports multiple output formats:
### Text Format (Default)
The default text format contains concatenated files, each prefixed with a header:
```
--- File: path/to/file.py ---
[content of file.py]
--- File: path/to/another.md ---
[content of another.md]
```
### JSON Format
JSON format provides structured data with metadata:
```json
{
"files": [
{
"path": "path/to/file.py",
"content": "[file content]",
"size": 1234,
"lines": 45
}
],
"summary": {
"total_files": 1,
"total_size": 1234
}
}
```
### XML Format
XML format for structured data processing:
```xml
```
### Markdown Format
Markdown format with syntax highlighting for documentation:
```markdown
# Repository Contents
Total files: 2
## path/to/file.py
```python
[file content with syntax highlighting]
```
## path/to/another.md
```markdown
[file content]
```
```
The script logs included/excluded files and estimates the total token count for AI model compatibility.
## Troubleshooting
- **No files included**: Check `config.yaml` for correct `text_extensions`. Verify the repository path:
```bash
ls /path/to/repository/src/*.py
```
- **Files in `debug/` or `outputs/` included**: Ensure `.gitignore` excludes these directories. You can test your theory by explicitly excluding them with --exclude "debug/".
- **High token count**: Exclude large files like `poetry.lock`:
```bash
python git2one.py /path/to/repository --exclude "poetry.lock"
```
- **`.gitignore` not respected**: Verify `.gitignore` exists in the repository root and contains valid patterns. You can test if it's being ignored by running with --ignore-gitignore.
## Testing
Run the test suite to verify functionality:
```bash
python -m pytest test_git2one.py -v
```
Or run tests with unittest:
```bash
python test_git2one.py
```
The test suite covers:
- Configuration loading
- File filtering and pattern matching
- Comment stripping functionality
- Multiple output formats
- Repository processing with various options
## License
MIT License. See `LICENSE` for details.
## Contributing
We welcome contributions to git2one! This section outlines the process for contributing to this project.
### Getting Started
1. **Fork the repository** on GitHub at [https://github.com/stkzlv/git2one](https://github.com/stkzlv/git2one)
2. **Clone your fork** locally:
```bash
git clone https://github.com/your-username/git2one.git
cd git2one
```
3. **Install dependencies**:
```bash
pip install -r requirements.txt
```
4. **Create a branch** for your feature or bug fix:
```bash
git checkout -b feature/your-feature-name
```
### Development Guidelines
#### Code Style
- Follow [PEP 8](https://pep8.org/) Python style guidelines
- Use meaningful variable and function names
- Add docstrings to all functions and classes
- Keep functions focused and single-purpose
- Maximum line length: 100 characters
#### Testing
- **Write tests** for all new functionality
- **Run existing tests** to ensure nothing breaks:
```bash
python test_git2one.py
```
- Aim for high test coverage of new code
- Test edge cases and error conditions
#### Documentation
- Update the README.md if you add new features
- Add inline comments for complex logic
- Update configuration examples if needed
- Include usage examples for new features
### Submitting Changes
1. **Commit your changes** with clear, descriptive messages:
```bash
git add .
git commit -m "Add JSON output format support
- Implement JSON formatter with file metadata
- Add auto-detection of output format from file extension
- Update CLI to support --format option
- Add comprehensive tests for new functionality"
```
2. **Push to your fork**:
```bash
git push origin feature/your-feature-name
```
3. **Create a Pull Request** on [GitHub](https://github.com/stkzlv/git2one) with:
- Clear title describing the change
- Detailed description of what was changed and why
- Reference any related issues
- Screenshots or examples if applicable
### Pull Request Guidelines
- **One feature per PR**: Keep pull requests focused on a single feature or bug fix
- **Update tests**: Ensure all tests pass and add new tests for your changes
- **Update documentation**: Include relevant documentation updates
- **Clean commit history**: Squash commits if necessary to maintain a clean history
- **Respond to feedback**: Address review comments promptly and professionally
### Reporting Issues
When reporting bugs or requesting features:
1. **Search existing issues** first to avoid duplicates
2. **Use the issue templates** if available
3. **Provide detailed information**:
- Python version
- Operating system
- Steps to reproduce the issue
- Expected vs actual behavior
- Sample code or repository (if applicable)
### Code of Conduct
- **Be respectful** and inclusive in all interactions
- **Provide constructive feedback** in code reviews
- **Help newcomers** and answer questions when possible
- **Focus on the code**, not the person
- **Assume good intentions** from other contributors
### Development Setup
For more advanced development:
1. **Set up a virtual environment**:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
```
2. **Run tests with coverage**:
```bash
python -m pytest test_git2one.py --cov=git2one --cov-report=html
```
3. **Check code style**:
```bash
flake8 git2one.py test_git2one.py
```
### Feature Requests
We're always interested in new features! Before implementing:
1. **Open an issue** to discuss the feature
2. **Explain the use case** and why it would be valuable
3. **Consider backwards compatibility**
4. **Discuss implementation approach** if complex
### Questions?
- Open an issue with the "question" label on [GitHub](https://github.com/stkzlv/git2one/issues)
- Check existing issues and documentation first
- Be specific about what you're trying to achieve
Thank you for contributing to git2one! 🎉
## Support
If you find this tool useful, please consider:
- ⭐ Starring the repository on [GitHub](https://github.com/stkzlv/git2one)
- 🐛 Reporting bugs or requesting features via [Issues](https://github.com/stkzlv/git2one/issues)
- 🔀 Contributing improvements via [Pull Requests](https://github.com/stkzlv/git2one/pulls)
## Author
Created and maintained by [stkzlv](https://github.com/stkzlv).