An open API service indexing awesome lists of open source software.

https://github.com/tingjiainfuture/pixrep

Let LLMs see your codebase just like you do.
https://github.com/tingjiainfuture/pixrep

context-window llm multimodal pdf-generation token-optimization

Last synced: about 2 months ago
JSON representation

Let LLMs see your codebase just like you do.

Awesome Lists containing this project

README

          

# pixrep logo pixrep

# ๐Ÿ“‰ SAVE UP TO 90% TOKENS

### Turn Codebases into **Visual Context** for Multimodal LLMs

[![PyPI version](https://img.shields.io/pypi/v/pixrep?color=blue)](https://pypi.org/project/pixrep/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![GitHub Stars](https://img.shields.io/github/stars/TingjiaInFuture/pixrep?style=social)](https://github.com/TingjiaInFuture/pixrep)

---

## ๐Ÿ“– Introduction

**pixrep** is a developer tool designed to bridge the gap between large code repositories and Multimodal Large Language Models.

Instead of feeding raw text that consumes massive context windows, **pixrep** converts your repository into a **structured, hierarchical set of PDFs**. This allows you to:

* **Save 90% Tokens:** Visual encoding is far more efficient than text tokenization.
* **Test for Free:** Easily share your entire codebase with premium models (like **Claude Opus 4.6**) on platforms like **arena.ai** without hitting text limits.

## ๐Ÿš€ Why Visual Code?

Traditional text tokenization is expensive. Visual encoding compresses structure efficiently.

*Comparison in Google AI Studio (Gemini 3 Pro):*


Raw Files (Text Input)
pixrep OnePDF (Visual Input)


Raw Files Usage
OnePDF Usage


31,812 Tokens โŒ
(Cluttered context)
19,041 Tokens โœ…
(Clean, single file)

## ๐ŸŽ“ Academic Backing

The core philosophy of **pixrep** (rendering code โ†’ PDF with syntax highlighting + heatmaps) has been validated by top-tier papers from 2025โ€“2026:

* **Text or Pixels? It Takes Half** (arXiv:2510.18279): Rendering text as images saves **~50% decoder tokens** while maintaining or improving performance.
* **DeepSeek-OCR** (arXiv:2510.18234): Visual encoding achieves **10โ€“20ร— compression ratios** for dense, structured text.
* **CodeOCR** (arXiv:2602.01785, Feb 2026): A **code-specific** study showing that visual input with syntax highlighting improves performance even at **4ร— compression**. In tasks like clone detection, the visual approach outperforms plain text.

**Verdict:** In the multimodal era, the optimal way to feed code is via **"visual perception" rather than "text reading."**

## โœจ Features

* **๐Ÿ“‰ High Efficiency:** Drastically reduces context window usage for large repos.
* **โšก Faster Scanning:** Single-pass file loading (binary check + line count + optional content decode) to reduce I/O overhead.
* **๐ŸŽจ Syntax Highlighting:** Supports 50+ languages (Python, JS, Rust, Go, C++, etc.) with a "One Dark" inspired theme.
* **๐Ÿง  Semantic Minimap:** Auto-generates per-file micro UML / call graph summaries to expose structure at a glance.
* **๐Ÿ”ฅ Linter Heatmap:** Integrates `ruff` / `eslint` findings and marks risky lines with red/yellow visual overlays.
* **๐Ÿ”Ž Query Mode:** Search by text or semantic symbols, then render only matched snippets to PDF/PNG.
* **๐Ÿ—‚๏ธ Hierarchical Output:** Generates a clean `00_INDEX.pdf` summary and separate files for granular access.
* **๐ŸŒ CJK Support:** Built-in font fallback for Chinese/Japanese/Korean characters (Auto-detects OS fonts).
* **๐Ÿ›ก๏ธ Smart Filtering:** Respects `.gitignore` patterns and supports custom ignore rules.
* **๐Ÿ“Š Insightful Stats:** Calculates line counts and language distribution automatically.
* **๐Ÿงพ Scan Diagnostics:** Prints scan summary (`seen/loaded/ignored/binary/errors`) for faster troubleshooting.

## ๐Ÿ“ฆ Installation

```bash
pip install pixrep
```

For PNG output support (`--format png`), install optional extras:

```bash
pip install "pixrep[png]"
```

## ๐Ÿ› ๏ธ Usage

### Quick Start
Convert the current directory to hierarchial PDFs in `./pixrep_output/`:

```bash
pixrep .
```

**Or pack everything into a single, token-optimized PDF (Recommended for LLMs):**

```bash
pixrep onepdf .
```

### Common Commands

**Generate PDFs for a specific repo:**
```bash
pixrep generate /path/to/my-project -o ./my-project-pdfs
```

**Pack core code into a single minimized PDF (all-in-one):**
```bash
pixrep onepdf /path/to/my-project -o ./ONEPDF_CORE.pdf
```
Notes:
* Defaults to `git ls-files` (tracked files) when available.
* Defaults to "core-only" filtering (skips docs/tests); use `--no-core-only` to include them.

**Preview structure and stats (without generating PDFs):**
```bash
pixrep list /path/to/my-project
```
`list` mode now uses lightweight scanning (no file content decode), so large repos respond significantly faster.

**Show only top 5 languages in the summary:**
```bash
pixrep list . --top-languages 5
```

**Query and render only matching snippets:**
```bash
pixrep query . -q "cache" --glob "*.py" --format png
```

**Semantic query (Python symbols) with interactive terminal preview:**
```bash
pixrep query . -q "CodeInsight" --semantic --tui
```

### CLI Reference

| Argument | Description | Default |
| :--- | :--- | :--- |
| `repo` | Path to the code repository. | `.` (Current Dir) |
| `-o`, `--output` | Directory to save the generated PDFs. | `./pixrep_output/` |
| `--max-size` | Max file size to process (in KB). Files larger than this are skipped. | `512` KB |
| `--ignore` | Additional glob patterns to ignore (e.g., `*.json` `test/*`). | `[]` |
| `--index-only` | Generate only the `00_INDEX.pdf` (Directory tree & stats). | `False` |
| `--disable-semantic-minimap` | Turn off per-file semantic UML/callgraph panel. | `False` |
| `--disable-lint-heatmap` | Turn off linter-based line heatmap background. | `False` |
| `--linter-timeout` | Timeout seconds for each linter command. | `20` |
| `--list-only` | Print the directory tree and stats to console, then exit. | `False` |
| `-V`, `--version` | Show version information. | - |

## โš™๏ธ Performance Notes

`pixrep` now applies two execution paths:

1. **Light scan path** (`pixrep list`, `pixrep generate --index-only`, `--list-only`):
only metadata and line counts are collected; file content is not loaded.
2. **Full scan path** (regular `pixrep generate`):
file content is decoded only when needed for PDF rendering.

This reduces memory pressure and disk I/O for repository exploration workflows.

Lint/semantic caches are now stored in user cache directories by default:

* Windows: `%LOCALAPPDATA%/pixrep/cache/`
* Linux/macOS: `$XDG_CACHE_HOME/pixrep/` or `~/.cache/pixrep/`

You can override with `PIXREP_CACHE_DIR`.

## ๐Ÿ“‚ Output Structure

After running `pixrep .`, you will get a folder structure optimized for LLM upload:

```text
pixrep_output/pixrep/
โ”œโ”€โ”€ 00_INDEX.pdf # <--- Upload this first! Contains tree & stats
โ”œโ”€โ”€ 001_LICENSE.pdf
โ”œโ”€โ”€ 002_README.md.pdf
โ”œโ”€โ”€ 003_pixrep___init__.py.pdf
โ”œโ”€โ”€ 005_pixrep_cli.py.pdf
โ””โ”€โ”€ ...
```

## ๐Ÿงฉ Supported Languages
pixrep automatically detects and highlights syntax for:
* **Core:** Python, C, C++, Java, Rust, Go
* **Web:** HTML, CSS, JavaScript, TypeScript, Vue, Svelte
* **Config:** JSON, YAML, TOML, XML, Dockerfile, Ini
* **Scripting:** Bash, Lua, Perl, Ruby, PHP
* **And more:** Swift, Kotlin, Scala, Haskell, OCaml, etc.

## ๐Ÿค Contributing

We welcome contributions! Please feel free to submit a Pull Request.

1. Fork the repository.
2. Create your feature branch (`git checkout -b feature/AmazingFeature`).
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`).
4. Push to the branch (`git push origin feature/AmazingFeature`).
5. Open a Pull Request.

## ๐Ÿ“„ License

Distributed under the MIT License. See `LICENSE` for more information.