https://github.com/samestrin/llm-file-processor
Automate, standardize, and enrich your files at scale with LLM-powered transformations
https://github.com/samestrin/llm-file-processor
Last synced: 4 months ago
JSON representation
Automate, standardize, and enrich your files at scale with LLM-powered transformations
- Host: GitHub
- URL: https://github.com/samestrin/llm-file-processor
- Owner: samestrin
- License: mit
- Created: 2025-05-15T23:05:10.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-16T20:37:02.000Z (about 1 year ago)
- Last Synced: 2025-10-04T17:53:14.610Z (8 months ago)
- Language: JavaScript
- Size: 79.1 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# LLM File Processor
[](https://github.com/samestrin/llm-file-processor/stargazers) [](https://github.com/samestrin/llm-file-processor/network/members) [](https://github.com/samestrin/llm-file-processor/watchers)
 [](https://opensource.org/licenses/MIT) [](https://nodejs.org/)
> **Automate, standardize, and enrich your files at scale with LLM-powered transformations**
A flexible Node.js CLI that applies custom LLM prompts to files or entire directories—turn unstructured documentation, code, or data into consistent, structured, and actionable outputs with minimal effort.
## Key Features
* **Rule-Driven Workflows**: Define a single prompt file containing transformation rules, and let the CLI enforce them across every input file.
* **LLM-Agnostic**: Swap models or providers via environment variables; works with any OpenAI-compatible API endpoint.
* **Batch & Parallel Processing**: Process individual files or entire directories in configurable batch sizes, with optional delays for rate-limiting.
* **Dry-Run Mode**: Preview combined prompts without making API calls, perfect for testing and validation.
* **JSON-First Output**: Receive clean, machine-readable JSON responses for seamless integration into pipelines.
* **Prompt Validation**: Built-in LLM-based prompt sanity checks to ensure your rules translate into valid transformations.
## Use Cases
1. **Uniform Documentation**
Standardize a scattered collection of markdown files—add TOCs, enforce heading hierarchies, flag missing sections, and generate summary sections automatically.
2. **Web Content Summarization**
Crawl or aggregate dozens (or hundreds) of web pages, then compress and transform them into structured in-context learning data for your next prompt-engineering or fine-tuning project.
3. **Automated Code Review & Linting**
Feed diffs or code snippets through custom prompts to enforce style guides, detect anti-patterns, and suggest refactors at scale.
4. **Test Case Generation**
Generate unit or integration tests by providing source files and rules for expected behaviors—ideal for accelerating test coverage in legacy codebases.
5. **Changelog & Release Notes**
Scan commit messages or diff logs, then automatically produce human-friendly change summaries and release notes in your preferred format.
6. **Data Extraction & Metadata Tagging**
Transform CSVs, logs, or JSON files by extracting key fields, tagging records, or reformatting data for downstream analytics.
7. **Migration of Legacy Formats**
Batch-convert legacy documentation, configuration files, or proprietary formats into modern standards (e.g., Markdown → Markdown with frontmatter, YAML → JSON).
8. **Localization & Internationalization**
Automate translation or adaptation of text files by applying LLM-based translation prompts, with markers for review or missing strings.
9. **CI/CD Integration**
Incorporate the CLI into Git hooks or CI pipelines to enforce content and code health checks on every commit or pull request.
10. **Training Data Preparation**
Generate clean, structured training examples by defining in-context learning rules—ideal for building your own LLM benchmarks or fine-tuning datasets.
## Installation
You can install the LLM File Processor globally via npm:
```bash
npm install -g llm-file-processor
```
Alternatively, you can use `npx` to run it without installing globally:
```bash
npx llm-file-processor [options]
```
If you prefer to clone the repository and run it locally:
```bash
# Clone repository
git clone https://github.com/samestrin/llm-file-processor.git
cd llm-file-processor
# Install dependencies
npm install
# Make CLI executable (if running directly)
chmod +x llm-file-processor.js
# (Optional) Link globally for local development
npm link
```
## Configuration
Create a `.env` file in the project root:
```dotenv
OPENAI_API_KEY=your_api_key_here
OPENAI_MODEL=your-model-identifier # e.g. gpt-4.1
```
> **Tip:** Use any OpenAI-compatible endpoint by setting the `OPENAI_API_URL` environment variable.
## Usage
```bash
# Process a single file
llm-file-processor --prompt-file path/to/prompt.txt --file path/to/doc.md
# Process an entire directory
llm-file-processor --prompt-file path/to/prompt.txt --directory path/to/project/docs
# Preview prompts without API calls
llm-file-processor -p prompt.txt -f file.md --dry-run
# Generate test files with modified filenames
llm-file-processor -p test-generation.txt -f userAuthentication.js --insert-before-ext ".test"
# Process log files and output as JSON
llm-file-processor -p extract-data.txt -d logs/ --output-ext json
# Process multiple files and merge results into a single output
llm-file-processor -p extract-data.txt -d logs/ -m json
# Process files and merge with custom extension
llm-file-processor -p summarize.txt -d articles/ -m md --output-ext summary.md
# Batch process with custom settings
llm-file-processor -p rules.txt -d src -b 5 --delay 1000
```
### CLI Options
| Option | Description |
| ----------------------------- | ------------------------------------------------------------------------------------- |
| `-p, --prompt-file ` | Path to the prompt file (required) |
| `-f, --file ` | Path to a single file to process |
| `-d, --directory ` | Path to a directory of files to process |
| `-o, --output ` | Specify a custom output directory (default: `./processed-`) |
| `--insert-before-ext ` | Insert text before file extension (e.g., ".test" for "file.test.js" from "file.js") |
| `--output-ext ` | Change or add file extension (e.g., "json" to save as "file.log.json") |
| `-m, --merge ` | Merge all processed files into a single output file "" |
| `--dry-run` | Combine prompts and files without sending to LLM |
| `-b, --batch-size ` | Number of files per batch (default: 1) |
| `--delay ` | Milliseconds to wait between API batches (default: 500) |
| `-h, --help` | Display help information |
| `-v, --version` | Display version information |
## Writing Effective Prompts
Craft transformation rules in your prompt file to guide the LLM. Example:
```
1. Generate a table of contents.
2. Normalize all headings to Markdown `##`, `###`, etc.
3. Flag sections missing a required `Summary` header.
4. Append a `## Key Takeaways` section at the end.
```
## Contribute
Contributions to this project are welcome. Please fork the repository and submit a pull request with your changes or improvements.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Share
[](https://twitter.com/intent/tweet?text=Check%20out%20this%20awesome%20project!&url=https://github.com/samestrin/llm-file-processor) [](https://www.facebook.com/sharer/sharer.php?u=https://github.com/samestrin/llm-file-processor) [](https://www.linkedin.com/sharing/share-offsite/?url=https://github.com/samestrin/llm-file-processor)