{"id":35835882,"url":"https://github.com/stevenbtw/deriva","last_synced_at":"2026-03-01T12:05:35.488Z","repository":{"id":331934435,"uuid":"1125853497","full_name":"StevenBtw/deriva","owner":"StevenBtw","description":"Deriva is a research project aimed at enabling digital architects by automating the derivation of architecture models. Deriva uses a combination of knowledge graphs, heuristics and LLM's.","archived":false,"fork":false,"pushed_at":"2026-01-24T20:41:14.000Z","size":2622,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-24T21:19:21.011Z","etag":null,"topics":["archimate","archimate-models","architecture","ast","cli","deriva","derivation","enterprise-architecture","graph-algorigthms","graph-algorithms","heuristics","knowledge-graph","llm","marimo","neo4j","ocel2","pydantic","python"],"latest_commit_sha":null,"homepage":"https://deriva.dev","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/StevenBtw.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":".github/SECURITY.md","support":".github/SUPPORT.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"StevenBtw"}},"created_at":"2025-12-31T13:54:17.000Z","updated_at":"2026-01-18T22:57:27.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/StevenBtw/deriva","commit_stats":null,"previous_names":["stevenbtw/deriva"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/StevenBtw/deriva","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StevenBtw%2Fderiva","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StevenBtw%2Fderiva/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StevenBtw%2Fderiva/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StevenBtw%2Fderiva/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/StevenBtw","download_url":"https://codeload.github.com/StevenBtw/deriva/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StevenBtw%2Fderiva/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29969243,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T11:43:06.159Z","status":"ssl_error","status_checked_at":"2026-03-01T11:43:03.887Z","response_time":124,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archimate","archimate-models","architecture","ast","cli","deriva","derivation","enterprise-architecture","graph-algorigthms","graph-algorithms","heuristics","knowledge-graph","llm","marimo","neo4j","ocel2","pydantic","python"],"created_at":"2026-01-08T00:14:53.558Z","updated_at":"2026-03-01T12:05:35.481Z","avatar_url":"https://github.com/StevenBtw.png","language":"Python","funding_links":["https://github.com/sponsors/StevenBtw"],"categories":[],"sub_categories":[],"readme":"# Deriva\n[![Research Project](https://img.shields.io/badge/Research-Project-blueviolet.svg)](#)\n[![Build Status](https://github.com/StevenBtw/Deriva/actions/workflows/ci.yml/badge.svg)](https://github.com/StevenBtw/Deriva/actions/workflows/ci.yml)\n[![License](https://img.shields.io/badge/license-AGPL--3.0-blue.svg)](LICENSE)\n[![Python 3.14+](https://img.shields.io/badge/python-3.14+-blue.svg)](https://www.python.org/downloads/)\n[![Neo4j](https://img.shields.io/badge/Neo4j-5.x-008CC1.svg?logo=neo4j)](https://neo4j.com/)\n[![Docker](https://img.shields.io/badge/Docker-required-2496ED.svg?logo=docker)](https://www.docker.com/)\n[![Marimo](https://img.shields.io/badge/Marimo-notebook-orange.svg)](https://marimo.io/)\n\n**Automatically generate ArchiMate enterprise architecture models from software repositories.**\n\nDeriva analyzes code repositories and transforms them into [ArchiMate](https://www.opengroup.org/archimate-forum) models that can be opened in the [Archi modeling tool](https://www.archimatetool.com/).\n\n## How It Works\n\n1. **Clone** a Git repository\n2. **Extraction** - Build a graph representation in Neo4j:\n   - **Classify phase**: Categorize files by type and subtype using registry\n   - **Parse phase**: Extract semantic nodes (TypeDefinitions, Methods, BusinessConcepts, etc.)\n   - Python files use fast AST parsing; other languages use LLM\n3. **Derivation** - Generate ArchiMate elements using a hybrid approach:\n   - **Prep phase**: Graph enrichment (PageRank, Louvain communities, k-core)\n   - **Generate phase**: LLM-based element derivation with graph metrics\n   - **Refine phase**: Relationship derivation and quality assurance\n4. **Export** to `.xml` file (ArchiMate format)\n\n## Quick Setup\n\n### Prerequisites\n\n- **Python 3.14+**\n- **Docker** (for Neo4j)\n- **uv** (Python package manager)\n\n### 1. Install uv\n\n```bash\n# Windows (PowerShell)\npowershell -c \"irm https://astral.sh/uv/install.ps1 | iex\"\n\n# macOS/Linux\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n```\n\n### 2. Clone and Configure\n\n```bash\ngit clone https://github.com/StevenBtw/Deriva.git\ncd Deriva\n\n# Create environment configuration\ncp .env.example .env\n# Edit .env with your settings (Neo4j, LLM API keys, etc.)\n```\n\n### 3. Create Python Environment\n\n```bash\nuv venv --python 3.14\n```\n\nActivate the virtual environment:\n\n```bash\n# Windows PowerShell\n.venv\\Scripts\\Activate.ps1\n\n# Windows Command Prompt\n.venv\\Scripts\\activate.bat\n\n# macOS/Linux\nsource .venv/bin/activate\n```\n\n### 4. Install Dependencies\n\n```bash\nuv sync\n```\n\n### 5. Start Neo4j\n\n```bash\ncd deriva/adapters/neo4j\ndocker-compose up -d\n```\n\nNeo4j will be available at:\n\n- **Browser UI**: http://localhost:7474 (no authentication)\n- **Bolt Protocol**: `bolt://localhost:7687`\n\nVerify Neo4j is running:\n\n```bash\ndocker ps  # Should show deriva_neo4j container\n```\n\n### 6. Launch Deriva\n\n```bash\ncd ../../..  # Back to Deriva root\nuv run marimo edit deriva/app/app.py\n```\n\nThe marimo notebook opens in your browser at: http://127.0.0.1:2718\n\n---\n\n## First Time Setup\n\nWhen you first open Deriva, you need to seed the configuration database.\n\n### 1. Seed File Type Registry\n\nNavigate to **Column 2: Manage Extraction** → **File Type Registry**\n\n1. Click **\"Seed from JSON\"**\n2. This loads default file type mappings from `extraction_config.json`\n3. Categories include: Source, Config, Docs, Test, Build, Asset, Data, Exclude\n\n### 2. Enable Extraction Steps\n\nNavigate to **Column 2: Manage Extraction** → **Extraction Step Configuration**\n\nEnable the extraction steps you need:\n\n| Step | Purpose | Recommended |\n|------|---------|-------------|\n| Repository | Creates root node for the repo | Always |\n| Directory | Creates directory structure nodes | Always |\n| File | Creates file nodes with classification | Always |\n| TypeDefinition | Extracts classes, functions (AST for Python) | Yes |\n| Method | Extracts methods from type definitions | Optional |\n| Edge | Extracts relationships (IMPORTS, USES, CALLS, DECORATED_BY, REFERENCES) | Yes |\n| Technology | Detects frameworks and libraries | Optional |\n| ExternalDependency | Maps external dependencies | Optional |\n| Test | Extracts test definitions | Optional |\n\n### 3. Configure LLM (Optional)\n\nIf using LLM-assisted extraction, configure your provider in `.env`:\n\n```bash\n# Set default model to use\nLLM_DEFAULT_MODEL=mistral-devstral\n\n# Configure the model (naming: LLM_{NAME}_*)\nLLM_MISTRAL_DEVSTRAL_PROVIDER=mistral\nLLM_MISTRAL_DEVSTRAL_MODEL=devstral-2512\nLLM_MISTRAL_DEVSTRAL_URL=https://api.mistral.ai/v1/chat/completions\nLLM_MISTRAL_DEVSTRAL_KEY=your-key-here\nLLM_MISTRAL_DEVSTRAL_STRUCTURED_OUTPUT=true\n```\n\n---\n\n## Using Deriva\n\n### Basic Workflow\n\n#### 1. Clone a Repository\n\n**Column 1: Configuration → Repositories**\n\n1. Enter repository URL (e.g., `https://github.com/user/repo.git`)\n2. Optionally specify a target name\n3. Click **\"Clone\"**\n\n#### 2. Run the Pipeline\n\n**Column 0: Run Deriva**\n\n- Click **\"Run Deriva\"** to run the full pipeline (extraction → derivation)\n- Or use individual step buttons: **Extraction**, **Derivation**\n\nResults display in a status callout showing nodes/elements created and any errors.\n\n#### 3. View Results\n\n**Column 1: Configuration**\n\n- **Graph Statistics**: Node counts by type (Repository, Directory, File, etc.)\n- **ArchiMate Model**: Element and relationship counts by type\n\n#### 4. Export to Archi\n\n**Column 1: Configuration → ArchiMate Model**\n\n1. Set export path (default: `workspace/output/model.xml`)\n2. Click **\"Export Model\"**\n3. Open the file with [Archi](https://www.archimatetool.com/)\n\n**Via CLI:**\n```bash\nderiva export -o workspace/output/model.xml\n```\n\n---\n\n## Configuration\n\n### Environment Variables (.env)\n\nAll configuration lives in `.env`. Key settings:\n\n```bash\n# Neo4j (default docker-compose has auth disabled)\nNEO4J_URI=bolt://localhost:7687\nNEO4J_USERNAME=\nNEO4J_PASSWORD=\n\n# LLM Provider (mistral, openai, azure, anthropic, ollama, lmstudio)\nLLM_MISTRAL_DEVSTRAL_PROVIDER=mistral\nLLM_MISTRAL_DEVSTRAL_MODEL=devstral-2512\nLLM_MISTRAL_DEVSTRAL_URL=https://api.mistral.ai/v1/chat/completions\nLLM_MISTRAL_DEVSTRAL_KEY=your-mistral-api-key\nLLM_MISTRAL_DEVSTRAL_STRUCTURED_OUTPUT=true\n\n# Namespaces\nNEO4J_GRAPH_NAMESPACE=Graph\nARCHIMATE_NAMESPACE=Model\n```\n\nSee `.env.example` for all available options.\n\n### Rate Limiting\n\nThe LLM adapter includes built-in rate limiting to prevent API throttling:\n\n```bash\n# Requests per minute (0 = use provider default: 60 RPM for cloud, unlimited for local)\nLLM_RATE_LIMIT_RPM=0\n\n# Minimum delay between requests in seconds\nLLM_RATE_LIMIT_DELAY=0.0\n\n# Max retries on rate limit (429) errors\nLLM_RATE_LIMIT_RETRIES=3\n\n# Adaptive throttling (reduces RPM when hitting rate limits)\nLLM_THROTTLE_ENABLED=true\nLLM_THROTTLE_MIN_FACTOR=0.25    # Minimum 25% of configured RPM\nLLM_THROTTLE_RECOVERY_TIME=60   # Seconds before trying to increase RPM\n\n# Circuit breaker (stops requests when provider is failing)\nLLM_CIRCUIT_BREAKER_ENABLED=true\nLLM_CIRCUIT_FAILURE_THRESHOLD=5   # Consecutive failures to open circuit\nLLM_CIRCUIT_RECOVERY_TIME=30      # Seconds before testing recovery\n```\n\nDefault rate limits by provider:\n\n| Provider | Default RPM |\n|----------|-------------|\n| OpenAI | 30 |\n| Anthropic | 30 |\n| Mistral | 24 |\n| Ollama | Unlimited |\n| LM Studio | Unlimited |\n\nThe rate limiter automatically:\n\n- Throttles requests to stay within limits\n- Applies exponential backoff on rate limit errors (HTTP 429)\n- Respects Retry-After headers from providers\n- Adaptively reduces RPM when hitting rate limits (recovers over time)\n- Opens circuit breaker after consecutive failures to prevent cascading errors\n\n### Managing File Types\n\nIf you encounter **undefined extensions** during extraction:\n\n**Via UI (Marimo):**\n\n1. Navigate to **Column 2** → **Undefined Extensions**\n2. Add them to the registry:\n   - Extension (e.g., `.tsx`, `Dockerfile`)\n   - Type (source, config, docs, test, build, asset, data, exclude)\n   - Subtype (e.g., `typescript`, `docker`)\n\n**Via CLI:**\n\n```bash\n# List all registered file types\nderiva config filetype list\n\n# Add a new file type\nderiva config filetype add \".tsx\" source typescript\n\n# Delete a file type\nderiva config filetype delete \".tsx\"\n\n# Show file type statistics by category\nderiva config filetype stats\n```\n\n\u003e **Note:** Files with unrecognized extensions are automatically classified as `file_type=\"unknown\"` with their extension as the subtype. This ensures all files get proper classification even without explicit registry entries.\n\n### Updating Configurations (Versioning)\n\nDeriva uses a **versioning system** for configurations. When you update a config, a new version is created while preserving previous versions for rollback.\n\n**Correct ways to update configs:**\n\n1. **Via UI (Marimo)**: Navigate to the config section, edit, and click **\"Save Config\"**\n2. **Via CLI**: Use the `config update` command\n\n```bash\n# Update extraction config instruction\nderiva config update extraction BusinessConcept \\\n  -i \"New instruction text...\"\n\n# Update extraction config with batch size for multi-file LLM calls\nderiva config update extraction BusinessConcept \\\n  --batch-size 5\n\n# Update derivation config from file\nderiva config update derivation ApplicationComponent \\\n  --instruction-file prompts/app_component.txt\n\n# View all versions\nderiva config versions\n```\n\n**Do NOT use JSON import/export for config updates.** The `db_tool import` command is only for backup restoration or migration - it overwrites version history. See [BENCHMARKS.md](BENCHMARKS.md) for the optimization workflow.\n\n### Customizing Extraction Prompts\n\nFor LLM-assisted extraction steps:\n\n1. Navigate to **Column 2** → **Extraction Step Configuration**\n2. Expand a node type (e.g., TypeDefinition)\n3. Edit: Input File Types, Input Graph Elements, Instruction, Example\n4. Click **\"Save Config\"** (this creates a new version)\n\nAll prompts follow the **Input + Instruction + Example** pattern.\n\n---\n\n## UI Layout\n\nDeriva uses a multi-column marimo notebook layout:\n\n| Column | Purpose |\n|--------|---------|\n| **0** | **Run Deriva**: Pipeline execution buttons, status display |\n| **1** | **Configuration**: Runs, repositories, Neo4j, graph stats, ArchiMate, LLM |\n| **2** | **Extraction Settings**: File type registry, extraction step configuration |\n| **3** | **Derivation Settings**: Element type configuration (13 types across Business/Application/Technology layers), relationship derivation |\n\nThe UI is powered by `PipelineSession` from the services layer, providing a clean separation between presentation and business logic.\n\n---\n\n## Data Storage\n\n- **Neo4j Graph Database**:\n  - **Graph namespace**: Intermediate representation (Modules, Files, Dependencies)\n  - **Model namespace**: ArchiMate elements and relationships\n- **DuckDB** (`deriva/adapters/database/sql.db`): File type registry, extraction configs, settings\n\n### Clearing Data\n\n**Column 0: Run Overview**\n\n- **Clear Graph**: Removes all nodes/edges from Graph namespace\n- **Clear Model**: Removes all ArchiMate elements and relationships\n\n---\n\n## Querying Neo4j Directly\n\nAccess the Neo4j browser at http://localhost:7474 and run Cypher queries:\n\n```cypher\n// See all repositories\nMATCH (r:Graph:Repository) RETURN r\n\n// See files in a repo\nMATCH (repo:Graph:Repository)-[:Graph:CONTAINS*]-\u003e(f:Graph:File)\nWHERE repo.name = 'my-repo'\nRETURN f.name, f.file_type\n\n// See type definitions\nMATCH (td:Graph:TypeDefinition) RETURN td.name, td.type_category\n```\n\n---\n\n## CLI (Headless Mode)\n\nDeriva includes a full CLI for headless operation and automation:\n\n```bash\n# Help\nderiva --help\n\n# View configuration\nderiva config list extraction\nderiva config show extraction BusinessConcept\nderiva status\n\n# Manage file types\nderiva config filetype list\nderiva config filetype add \".lock\" dependency lock\nderiva config filetype stats\n\n# Run pipeline stages\nderiva run extraction --repo flask_invoice_generator -v\nderiva run derivation -v\nderiva run derivation --phase generate -v  # Run specific phase (prep, generate, refine)\nderiva run all --repo myrepo\n\n# Export ArchiMate model\nderiva export -o workspace/output/model.xml\n```\n\n**CLI Options:**\n\n| Option | Description |\n|--------|-------------|\n| `--repo NAME` | Process specific repository (default: all) |\n| `--phase PHASE` | Run specific derivation phase: prep, generate, or refine |\n| `-v, --verbose` | Print detailed progress |\n| `--no-llm` | Skip LLM-based steps (structural extraction only) |\n| `-o, --output PATH` | Output file path for export |\n\n---\n\n## Benchmarking\n\nDeriva includes a multi-model benchmarking system for comparing LLM performance across different providers and models. See [BENCHMARKS.md](BENCHMARKS.md) for the full guide and [OPTIMIZATION.md](OPTIMIZATION.md) for detailed case studies.\n\n### Running Benchmarks\n\n```bash\n# List available benchmark models\nderiva benchmark models\n\n# Run a benchmark with specific models\nderiva benchmark run \\\n  --repos flask_invoice_generator \\\n  --models openai-gptx,ollama-devstral \\\n  -n 3 \\\n  -d \"Comparing gptx with devstral\" \\\n  -v\n\n# List benchmark sessions\nderiva benchmark list\n\n# Analyze a benchmark session\nderiva benchmark analyze bench_20260101_150724\n```\n\n### Configuring Benchmark Models\n\nAdd models to `.env` using the pattern:\n\n```bash\n# Azure GPT-4o-mini\nLLM_AZURE_GPT4MINI_PROVIDER=azure\nLLM_AZURE_GPT4MINI_MODEL=gpt-4\nLLM_AZURE_GPT4MINI_URL=https://your-resource.openai.azure.com/...\nLLM_AZURE_GPT4MINI_KEY=your-api-key\n\n# Ollama local model\nLLM_OLLAMA_LLAMA_PROVIDER=ollama\nLLM_OLLAMA_LLAMA_MODEL=devstral\nLLM_OLLAMA_LLAMA_URL=http://localhost:11434/api/chat\n```\n\n### OCEL Event Logging\n\nBenchmark runs are logged in **OCEL 2.0** (Object-Centric Event Log) format for process mining analysis:\n\n- Events capture pipeline stages, LLM calls, and results\n- Object types: `BenchmarkSession`, `BenchmarkRun`, `Repository`, `Model`\n- Logs are saved to `workspace/benchmarks/{session_id}/events.ocel.json`\n\nOCEL files can be analyzed with process mining tools like PM4Py, Celonis, or custom analysis scripts.\n\n---\n\n## Troubleshooting\n\n### Neo4j Connection Issues\n\n```bash\n# Check if running\ndocker ps\n\n# View logs\ncd deriva/adapters/neo4j \u0026\u0026 docker-compose logs\n\n# Restart\ndocker-compose restart\n\n# Clear all data (destructive!)\ndocker-compose down -v\n```\n\n### Port Conflicts\n\nIf ports 7687/7474 are in use, edit `deriva/adapters/neo4j/docker-compose.yml`:\n\n```yaml\nports:\n  - \"7688:7687\"\n  - \"7475:7474\"\n```\n\nUpdate `.env` accordingly:\n\n```bash\nNEO4J_URI=bolt://localhost:7688\n```\n\n### Marimo Issues\n\n```bash\n# Check Python version\npython --version  # Should be 3.14+\n\n# Reinstall dependencies\nuv sync --reinstall\n\n# Run without watch mode\nuv run marimo edit deriva/app/app.py\n```\n\n---\n\n## Contributing\n\nFor development setup, architecture details, and contribution guidelines, see [CONTRIBUTING.md](CONTRIBUTING.md).\n\n---\n\n## License\n\nThis project is licensed under the **GNU Affero General Public License v3.0 (AGPL-3.0)**.\n\nThis means you can freely use, modify, and distribute this software, but if you run a modified version as a network service, you must make the source code available to users of that service.\n\nSee [LICENSE](LICENSE) for the full license text.\n\n## Acknowledgments\n\n- [Marimo](https://marimo.io) - Reactive Python notebooks\n- [Neo4j](https://neo4j.com) - Graph database\n- [ArchiMate](https://www.opengroup.org/archimate-forum) - Enterprise architecture standard\n- [Archi](https://www.archimatetool.com) - Open source ArchiMate modeling tool\n- [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) - Multi-language AST parsing\n\n---\n\n**Status**: Active Development\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstevenbtw%2Fderiva","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstevenbtw%2Fderiva","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstevenbtw%2Fderiva/lists"}