https://github.com/avm-sistemas/java-llm-extractor
A pragmatic utility designed to crawl through a JSF/Java legacy application and consolidate its menu architecture into a structured, LLM-ready Markdown document.
https://github.com/avm-sistemas/java-llm-extractor
ai-tools extractor java jsf llm llm-ready strategy
Last synced: 13 days ago
JSON representation
A pragmatic utility designed to crawl through a JSF/Java legacy application and consolidate its menu architecture into a structured, LLM-ready Markdown document.
- Host: GitHub
- URL: https://github.com/avm-sistemas/java-llm-extractor
- Owner: avm-sistemas
- Created: 2026-05-15T01:54:15.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-05-15T02:38:11.000Z (about 2 months ago)
- Last Synced: 2026-05-15T04:12:20.345Z (about 2 months ago)
- Topics: ai-tools, extractor, java, jsf, llm, llm-ready, strategy
- Language: TypeScript
- Homepage:
- Size: 30.3 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
Awesome Lists containing this project
README
# Java LLM Extractor
[](https://github.com/avm-sistemas/java-llm-extractor/actions/workflows/build.yml)
A pragmatic utility designed to crawl through a JSF/Java legacy application and consolidate its menu architecture into a structured, LLM-ready Markdown document.
This tool was built to solve the "context window" problem for AI agents (Copilot, Cursor, Claude, etc.) working on large-scale legacy systems — especially those built with **JSF + PrimeFaces + Java EE**. Instead of feeding thousands of raw source files into an LLM, you feed it the relevant slice of this document.
## Core Principles
* **Performance over contraptions:** Scans thousands of Java files in seconds, resolving labels, URLs and business methods in a single pass.
* **Operational efficiency:** Single-binary execution. No Node.js runtime required on the target machine.
* **KISS:** No complex UI. Just a CLI that does one thing: turns a JSF sidebar + Java source tree into a clean, semantic Markdown map.
## What it extracts
For each item in the application menu (`sidebar.xhtml`), the tool produces:
* **Menu hierarchy** — module > submodule > function (up to 3 levels)
* **Screen URL** — the JSF page path
* **Permission key** — the access control identifier (`trinityUtils.userHasPermission`)
* **Java classes** — the backing beans and controllers mapped to that screen
* **Business operations** — public non-getter/setter methods found in each class
* **Permission index** — a full table of all permissions cross-referenced to their function and module
## Usage
### Development (requires Node.js 18+)
```bash
npm install
npx tsx index.ts "" "" [--utf8]
```
**Arguments:**
| Argument | Description | Default |
|---|---|---|
| `` | Root path of the Java project | current directory |
| `` | Output Markdown file name | `menu-architecture.md` |
| `--utf8` | Write output in UTF-8 instead of ISO-8859-1 | Latin-1 (default) |
**Example:**
```bash
# Latin-1 output (standard)
npx tsx index.ts "C:\Projetos\Company\Company-1.0-master" "menu-architecture.md"
# UTF-8 output (for LLM APIs or web tools)
npx tsx index.ts "C:\Projetos\Company\Company-1.0-master" "menu-architecture.md" --utf8
```
### Binary (no Node.js required)
Download the binary for your OS from the **GitHub Actions** tab (under Artifacts) or build it locally:
```bash
npm run build
```
Then run:
```bash
# Windows
.\bin\java-llm-extractor-win.exe "C:\Projetos\Company\Company-1.0-master" menu-architecture.md
# Linux
./bin/java-llm-extractor-linux "C:/Projetos/Company/Company-1.0-master" menu-architecture.md
```
## Expected project structure
The tool expects a standard Maven multi-module layout:
```
/
web/src/main/
webapp/layout/sidebar.xhtml ← menu tree
properties/messages.properties ← PT-BR labels
java/... ← backing beans
core/src/main/java/... ← core services
common/src/main/java/... ← shared utilities
```
## Output sample
```markdown
##### Pesagem de Entrada
- **URL:** `/com/arcadian/product/web/pesagemEntradaModalRodoviario/pesar-modal-rodoviario-na-entrada.jsf`
- **Permission:** `pesar-modal-rodoviario-na-entrada`
- **Java Classes:**
- `web\src\main\java\...\PesagemEntradaModalRodoviarioController.java`
- Operations: iniciar, confirmar, cancelar, pesarVeiculo, gerarTicket
```
## Encoding
Source files are read in **ISO-8859-1** (Latin-1), which is the encoding standard. The output file is written in the same encoding by default. Use `--utf8` to produce a UTF-8 output when sending the document to external LLM APIs or web-based tools.
## Artifacts:
- [Binaries](https://github.com/avm-sistemas/java-llm-extractor/actions/runs/25896296004/artifacts/7008733037)
## Author
Andre Mesquita