https://github.com/sandy-sp/gittxt
Gittxt is a lightweight CLI tool that extracts text from Git repositories and formats it into AI-friendly outputs (.txt, .json, .md). Whether youβre using ChatGPT, Grok, or Ollama, or any LLM, Gittxt helps process repositories for insights, training, and documentation.
https://github.com/sandy-sp/gittxt
ai cli-tool git json llm markdown nlp repository text text-extraction
Last synced: over 1 year ago
JSON representation
Gittxt is a lightweight CLI tool that extracts text from Git repositories and formats it into AI-friendly outputs (.txt, .json, .md). Whether youβre using ChatGPT, Grok, or Ollama, or any LLM, Gittxt helps process repositories for insights, training, and documentation.
- Host: GitHub
- URL: https://github.com/sandy-sp/gittxt
- Owner: sandy-sp
- License: mit
- Created: 2025-02-25T01:54:39.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-04T09:20:43.000Z (over 1 year ago)
- Last Synced: 2025-03-04T09:24:01.457Z (over 1 year ago)
- Topics: ai, cli-tool, git, json, llm, markdown, nlp, repository, text, text-extraction
- Language: Python
- Homepage:
- Size: 166 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# π Gittxt: Get Text of Your Repo for AI, LLMs & Docs!
**Gittxt** is a **lightweight CLI tool** that extracts text from **Git repositories** and formats it into AI-friendly outputs (`.txt`, `.json`, `.md`). Whether youβre using ChatGPT, Grok, Ollama or any LLM, Gittxt helps you process repositories for insights, training, and documentation.
---
## β¨ Why Use Gittxt?
- **Extract Readable Text:** Easily pull text from code, docs, and other repository files.
- **AI-Friendly Outputs:** Generate outputs in TXT, JSON, and Markdown for different use cases.
- **Efficient Processing:** Faster scanning with incremental caching.
- **Flexible Filtering:** Use advanced flags like `--docs-only` and `--auto-filter` to control whatβs extracted.
- **Multi-Repository Support:** Scan one or more repositories in a single command.
---
## π Release v1.3.1
### New Features & Enhancements
- **Interactive Installation:**
Use the new `gittxt install` subcommand to set up your configuration (output directory, logging preferences, etc.) interactively.
- **Multi-Repository Scanning:**
Scan multiple repositories at once, whether they are local or remote.
- **Advanced Filtering Options:**
- `--docs-only`: Extract only documentation files (e.g., README, docs/ folder, etc.).
- `--auto-filter`: Automatically skip common unwanted or binary files.
- **Multi-Format Output:**
Specify multiple output formats simultaneously (e.g., `--output-format txt,json,md`).
- **Enhanced Summary Reports:**
Outputs include summary statistics and an estimated token count for further AI processing.
- **Improved Logging & Caching:**
Faster, more accurate scanning with incremental caching and a rotating log file system.
---
## π₯ Installation
### Via PIP
```bash
pip install gittxt==1.3.1
```
### First-Time Setup (Interactive)
After installing, run:
```bash
gittxt install
```
This command will prompt you to configure:
- Your default output directory (automatically set based on your OS, e.g., `~/Gittxt/` on Linux/Mac)
- Logging level and file logging preferences
---
## π How to Use Gittxt
### 1. Scanning Repositories
Use the `scan` subcommand to extract text and generate outputs.
#### Scan a Local Repository
```bash
gittxt scan .
```
Extracts all readable text into the default output directories.
#### Scan a Remote GitHub Repository
```bash
gittxt scan https://github.com/sandy-sp/sandy-sp
```
Automatically clones the repository, scans it, and extracts text.
#### Scan Multiple Repositories with Advanced Options
```bash
gittxt scan /path/to/repo1 https://github.com/user/repo2 --output-format txt,json --docs-only --auto-filter --summary
```
---
## π§ CLI Options
| Option | Description |
|--------------------------|---------------------------------------------------------------------------|
| `--include` | Include only files matching these patterns. |
| `--exclude` | Exclude files matching these patterns. |
| `--size-limit` | Exclude files larger than the specified size (in bytes). |
| `--branch` | Specify a Git branch (for remote repositories). |
| `--output-dir` | Override the default output directory. |
| `--output-format` | Comma-separated list of output formats (e.g., `txt,json,md`). |
| `--max-lines` | Limit the number of lines per file. |
| `--summary` | Display a summary report after scanning. |
| `--debug` | Enable debug mode for detailed logging. |
| `--docs-only` | Only extract documentation files (e.g., README, docs folder). |
| `--auto-filter` | Automatically skip common unwanted or binary files. |
---
## π Output Formats
- **TXT:** Simple text extraction for AI chat and quick analysis.
- **JSON:** Structured output ideal for LLM training and data preprocessing.
- **Markdown (MD):** Neatly formatted documentation for GitHub or project READMEs.
When specifying multiple formats (e.g., `--output-format txt,json`), Gittxt generates separate files in their respective output directories.
---
## π Directory Structure
By default, outputs are stored in your configured output directory, which is organized as follows:
```
/
βββ text/ # Plain text outputs (.txt)
βββ json/ # JSON outputs (.json)
βββ md/ # Markdown outputs (.md)
βββ cache/ # Caching for incremental scans
```
---
## βοΈ Configuration
Gittxt uses a configuration file (`gittxt-config.json`) to store user preferences. You can update this configuration via the interactive install command:
```bash
gittxt install
```
Or edit the file manually. Key settings include:
- **Output Directory:** Auto-determined based on your OS (e.g., `~/Gittxt/`).
- **Logging Options:** Logging level and file logging preferences.
- **Filtering Options:** Include/exclude patterns, file size limits, etc.
---
## π Contribute & Develop
1. **Run Tests:**
```bash
pytest tests/
```
2. **Format Code:**
```bash
black src/
```
3. **Submit a PR:**
- Fork the repo.
- Create a new branch (e.g., `feature/my-change`).
- Push your changes.
- Submit a PR.
For more details, see the [Contributing Guide](CONTRIBUTING.md).
---
## π‘ Future Roadmap
Our future plans include enhancements to the user interface and further AI-based features. Weβre working on a lightweight web-based UI and additional improvements that streamline repository analysis and documentation extraction.
---
## π License
Gittxt is licensed under the **MIT License**.
---
## **Made by [Sandeep Paidipati](https://github.com/sandy-sp)**
π **Gittxt: Get Text of Your Repo for AI, LLMs & Docs!**
---