{"id":47640561,"url":"https://github.com/eullm/eullm","last_synced_at":"2026-06-06T22:01:01.409Z","repository":{"id":345903622,"uuid":"1187529428","full_name":"eullm/eullm","owner":"eullm","description":"Open-source platform for creating, distributing and running sovereign EU-compliant LLMs. Verticalize any model for your domain, language and brand. AI Act ready.","archived":false,"fork":false,"pushed_at":"2026-06-03T15:26:22.000Z","size":2101,"stargazers_count":24,"open_issues_count":1,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-03T17:10:39.886Z","etag":null,"topics":["ai-sovereignty","data-sovereignty","eu-ai-act","europe","fine-tuning","gdpr","gguf","knowledge-distillation","llm","local-llm","mlops","model-compression","ollama","open-source","privacy","python","quantization","rust","self-hosted","sovereign-ai"],"latest_commit_sha":null,"homepage":"https://www.eullm.eu","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eullm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":".zenodo.json","notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-20T20:43:57.000Z","updated_at":"2026-06-03T12:26:50.000Z","dependencies_parsed_at":null,"dependency_job_id":"3f07fe62-a56b-4aa4-86e8-60d3c2b0b180","html_url":"https://github.com/eullm/eullm","commit_stats":null,"previous_names":["eullm/eullm"],"tags_count":31,"template":false,"template_full_name":null,"purl":"pkg:github/eullm/eullm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eullm%2Feullm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eullm%2Feullm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eullm%2Feullm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eullm%2Feullm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eullm","download_url":"https://codeload.github.com/eullm/eullm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eullm%2Feullm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34001197,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-06T02:00:07.033Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-sovereignty","data-sovereignty","eu-ai-act","europe","fine-tuning","gdpr","gguf","knowledge-distillation","llm","local-llm","mlops","model-compression","ollama","open-source","privacy","python","quantization","rust","self-hosted","sovereign-ai"],"created_at":"2026-04-02T00:51:11.744Z","updated_at":"2026-06-06T22:01:01.344Z","avatar_url":"https://github.com/eullm.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"eullm-logo-github.png\" alt=\"EULLM\" width=\"560\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\u003cstrong\u003eThe European Sovereign LLM Platform\u003c/strong\u003e\u003c/p\u003e\n\u003cp align=\"center\"\u003e\u003cstrong\u003eThe inference Engine is ready today.\u003c/strong\u003e Drop-in Ollama replacement, Apache 2.0, EU-sovereign, AI Act-ready audit trail, zero telemetry.\u003cbr\u003e\u003cem\u003ePlus a roadmap to verticalize, compress, and ship domain-specific models on European infrastructure.\u003c/em\u003e\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#try-it-now\"\u003eTry it now\u003c/a\u003e ·\n  \u003ca href=\"#whats-ready-today-whats-coming\"\u003eStatus\u003c/a\u003e ·\n  \u003ca href=\"#the-solution\"\u003eEngine\u003c/a\u003e ·\n  \u003ca href=\"#benchmarks--continuous-batching-scaling\"\u003eBenchmarks\u003c/a\u003e ·\n  \u003ca href=\"#why-eullm\"\u003eWhy EULLM\u003c/a\u003e ·\n  \u003ca href=\"#planned-verticalized-models-q4-2026-roadmap\"\u003eRoadmap\u003c/a\u003e ·\n  \u003ca href=\"#research--experiments\"\u003eResearch\u003c/a\u003e ·\n  \u003ca href=\"#contributing\"\u003eContributing\u003c/a\u003e ·\n  \u003ca href=\"https://eullm.eu\"\u003eWebsite\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/license-Apache%202.0-blue\" alt=\"License\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/EU%20AI%20Act-Designed%20for%20compliance-gold\" alt=\"EU AI Act\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Engine-v0.5.8%20%E2%80%94%20usable%20today-2ea44f\" alt=\"Engine status\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Forge%20%2B%20Hub-Early%20development-orange\" alt=\"Forge/Hub status\" /\u003e\n  \u003ca href=\"https://github.com/eullm/eullm/actions/workflows/ci.yml\"\u003e\u003cimg src=\"https://github.com/eullm/eullm/actions/workflows/ci.yml/badge.svg\" alt=\"CI\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://doi.org/10.5281/zenodo.20412979\"\u003e\u003cimg src=\"https://zenodo.org/badge/DOI/10.5281/zenodo.20412979.svg\" alt=\"DOI\" /\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  🇪🇺 European-built — focused on local-first and sovereign AI \u0026nbsp;·\u0026nbsp; 🇮🇹 Developed in Italy\n\u003c/p\u003e\n\n---\n\n## Try it now\n\n**EULLM Engine is a drop-in Ollama replacement built in Rust.** Download a binary, run any GGUF model (Qwen, Mistral, DeepSeek, Phi, Gemma, …), get an Ollama-compatible + OpenAI-compatible API on port 11434. No Python, no Docker, no telemetry.\n\n```bash\n# Linux x64 with NVIDIA GPU (RTX 3000 / 4000 / 5000 — Ampere/Ada/Blackwell)\ncurl -L https://github.com/eullm/eullm/releases/latest/download/eullm-linux-x64-cuda-12.8 -o eullm\nchmod +x eullm\n./eullm run your-model.gguf\n\n# In another terminal — same API your existing tooling already speaks:\ncurl http://localhost:11434/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\": \"qwen3\", \"messages\": [{\"role\": \"user\", \"content\": \"Ciao!\"}]}'\n```\n\n**All prebuilt binaries** — pick yours from the [latest release](https://github.com/eullm/eullm/releases/latest):\n\n| Platform | File | Status | Notes |\n|----------|------|:------:|-------|\n| 🐧 Linux x64 (CPU) | `eullm-linux-x64` | ✅ Tested | – |\n| 🐧 Linux x64 (NVIDIA) | `eullm-linux-x64-cuda-12.8` | ✅ Tested | RTX 3000/4000/5000 |\n| 🐧 Linux ARM64 | `eullm-linux-arm64` | 🧪 [Experimental — untested](#-platform-status--help-us-test) | RPi 4/5, Orange Pi 5+, Jetson, etc. |\n| 🍎 macOS Apple Silicon (Metal) | `eullm-macos-arm64` | 🧪 [Experimental — untested](#-platform-status--help-us-test) | M1/M2/M3/M4 |\n| 🍎 macOS Intel | `eullm-macos-x64` | 🧪 [Experimental — untested](#-platform-status--help-us-test) | Pre-Apple-Silicon Macs |\n| 🪟 Windows 11 x64 (CPU) | `eullm-windows-x64.exe` | ✅ Tested | Standalone binary, CLI/server |\n| 🪟 Windows 11 x64 (NVIDIA) | `eullm-windows-x64-cuda-12.8.zip` | ✅ Tested | ZIP bundles CUDA DLLs — extract, run |\n\n\u003e **Embedded chat UI — cross-platform.** Every `eullm` binary (Linux, macOS, Windows — CPU, CUDA, Metal) ships with a built-in browser chat. Run `eullm run model.gguf` and open **`http://localhost:11435/`** — same OpenAI/Ollama API on `:11434`, separate chat UI port `:11435` so it never collides with RAG / OpenAI-client routes on `/`. Turn it off with `--no-ui` for headless deployments.\n\u003e\n\u003e **Interactive picker.** Run `eullm` with no arguments (or `eullm run` with no model) and you get an interactive menu listing your locally installed GGUFs and the [EuLLM model catalog](catalog/v1/catalog.json) — pick one, the engine takes care of download + launch.\n\u003e\n\u003e **SmartScreen note (Windows):** the binaries are not yet code-signed, so first launch may show *\"Windows protected your PC\"*. Click **More info → Run anyway**. CUDA bundles ship the required CUDA DLLs alongside — no separate CUDA toolkit install needed (an up-to-date NVIDIA driver is enough).\n\u003e\n\u003e **One-click installer paused.** v0.5.6 shipped an Inno Setup `.exe` installer; we pulled it from v0.5.8 onwards because the SmartScreen warning, the launcher script edge cases, and the install-time PATH handling all need a redesign before re-shipping. The standalone binaries above are the supported Windows distribution.\n\n### 🧪 Platform status / help us test\n\nThe Linux x64 and Windows x64 binaries are validated end-to-end by the maintainer. The **macOS** (Intel + Apple Silicon) and **Linux ARM64** binaries compile in CI but the maintainer doesn't own that hardware — so they're shipped as **Experimental — untested**.\n\nIf you run local LLMs on a Mac or an ARM64 board (Raspberry Pi 4/5, Orange Pi 5+, Rock 5B, Jetson, …), **your help validating these binaries is hugely appreciated**. See the open testing call:\n\n→ **[Issue #140 — Help wanted: testing on macOS \u0026 ARM64 Linux](https://github.com/eullm/eullm/issues/140)** (`help wanted`, `testing`)\n\nPriority order: macOS Apple Silicon (Metal backend) → Linux ARM64 (Raspberry Pi 5) → macOS Intel. Reports with `eullm --version` output, model used, and what worked/broke go a long way.\n\n### Drop-in for Ollama-compatible clients\n\nSame port (11434), same Ollama API, plus OpenAI-compatible API on the same binary. Existing tooling (Open WebUI, LangChain, n8n, any OpenAI client) works without code changes:\n\n```bash\n# Was:   ollama run llama3\n# Now:   eullm run ./your-model.gguf --port 11434\n```\n\nWhat you get on top of the Ollama-compatible API:\n\n| Capability | EULLM Engine |\n|---|---|\n| **Continuous batching** scheduler — single-pass parallel decode across all active slots, shared KV pool (no per-slot KV pre-allocation) | ✅ on by default |\n| **Quantized KV cache** — Q4_0, Q5_0, Q5_1, Q8_0 KV types for up to ~4× context on the same GPU | ✅ flag `--cache-type-k q4_0` |\n| **AI Act audit trail** — local-only JSONL of every request/response, never transmitted | ✅ on by default |\n| **Zero telemetry** — no analytics, no crash reports, no usage stats | ✅ enforced |\n| **Single binary** — Rust, no Go runtime, no Python runtime, no Docker | ✅ |\n| **EU-hosted model registry** (Forge/Hub) | 🚧 in development |\n\n[→ Engine scaling](#benchmarks--continuous-batching-scaling) · [→ Why EULLM](#why-eullm)\n\n## What's ready today, what's coming\n\n| Component | Status | Use today? |\n|-----------|--------|------------|\n| **Engine** — Rust inference runtime, Ollama + OpenAI APIs, continuous batching, quantized KV cache (Q4_0/Q5/Q8), CUDA (RTX 3000/4000/5000), audit trail. Builds also exist for ROCm/Vulkan/Metal/ARM64 — see [platform status](#-platform-status--help-us-test) | ✅ **Ready (v0.5.8)** — Linux x64 + Windows x64 | **Yes** — drop-in for Ollama on tested platforms |\n| **Chat UI** — embedded browser chat (HTML/CSS/JS baked into `eullm.exe`, served on a separate port from the API) with Markdown + best-effort LaTeX→MathML rendering | ✅ **Ready (v0.5.5)** | **Yes** — auto-opens after install on Windows |\n| **Windows installer** — one-click `.exe` (Inno Setup) with Start Menu, optional PATH, browser launcher | 🚧 Paused after v0.5.6 — needs SmartScreen / launcher redesign before re-shipping | Use the standalone Windows binaries above for now |\n| **Forge** — verticalization pipeline (pruning + distillation + quantization + identity LoRA) | 🧪 Modules ready, end-to-end integration in progress | Researchers / advanced |\n| **Hub** — EU-hosted model registry with AI Act compliance cards | 🧪 Prototype API | Not yet |\n| **Demo models** — `legal-it-7b` / `medical-de-7b` / `finance-fr-7b` | 🚧 First model in training (Q4 2026) | Not yet |\n\n\u003e The Engine works **today, standalone, with any GGUF model** on Hugging Face. You don't need to wait for the Hub or Forge to use it. Star this repo to follow Forge \u0026 Hub releases.\n\n\u003e **Note on math rendering in the Chat UI:** the embedded UI ships a tiny,\n\u003e zero-dependency, best-effort LaTeX→MathML renderer covering the subset of\n\u003e LaTeX that LLMs commonly emit (`$…$` / `$$…$$`, `\\frac`, `\\sqrt`,\n\u003e superscripts/subscripts, Greek letters, common operators, spacing). It is\n\u003e **not** a full LaTeX engine — anything outside that subset (complex\n\u003e environments like `align`/`matrix`/`cases`, exotic macros) falls back to the\n\u003e raw text untouched, never a broken render. It renders client-side via native\n\u003e browser MathML, so no JS/WASM dependency is added and the stream/API stay raw.\n\n## The problem\n\n95% of AI infrastructure used in Europe depends on American or Chinese companies. Hosted APIs (OpenAI, Anthropic, Google) send every prompt outside the EU. Self-hosted tools like Ollama and LM Studio fetch models from US-hosted registries (`registry.ollama.ai`, `huggingface.co`) and many ping these endpoints for update checks by default.\n\nThe **EU AI Act** (Regulation 2024/1689) takes effect August 2, 2026. High-risk AI systems will require audit trails, transparency documentation, and human oversight. Existing open-source tools were not designed with this in mind.\n\nEuropean SMEs need AI models that:\n\n- **Run locally** on their own hardware or EU servers\n- **Comply** with GDPR and the AI Act out of the box\n- **Speak their language** and understand their domain\n- **Carry their brand** — not \"Powered by Qwen\" or \"Built with Llama\"\n- **Cost nothing** in ongoing API fees\n\nEULLM is the missing infrastructure.\n\n## The solution\n\nEULLM is an open-source platform with three components:\n\n### EULLM Engine\n\nRun sovereign LLMs locally with **real llama.cpp inference**, built-in audit trail, and full API compatibility. Single Rust binary, no Python runtime, no Docker required.\n\nBuilt on llama.cpp (MIT, EU-developed) with the standard set of quantized KV cache types (Q4_0, Q5_0, Q5_1, Q8_0) for ~2-4× context length on the same hardware. We also evaluated TurboQuant (Walsh-Hadamard / Lloyd-Max KV compression) end-to-end during v0.5.x but pulled it from the production build path — see [Research \u0026 Experiments](#research--experiments) for the rationale and the archived numbers.\n\n```bash\n# Run any GGUF model — local file or from the EU registry\neullm run ./model.gguf                    # Local GGUF file\neullm run ./model.gguf --batch-size 16    # Continuous batching for parallel requests\neullm run ./model.gguf --web              # Transparent web browsing (URLs in messages auto-fetched)\neullm run legal-it-7b                     # From EU registry (coming soon)\n\n# CLI\neullm list                                # Show local and available models\neullm show legal-it-7b                    # Model details, metadata, compliance info\neullm serve                               # Start API server without loading a model\n\n# API endpoints (Ollama-compatible + OpenAI-compatible)\n# http://localhost:11434/api/generate\n# http://localhost:11434/api/chat\n# http://localhost:11434/v1/chat/completions\n```\n\nKey features:\n- **Real inference** powered by llama.cpp (not a mock, not a proxy)\n- **Continuous batching** — multiple requests decoded in parallel, near-linear throughput scaling\n- **Token streaming** — NDJSON on Ollama endpoints, SSE on OpenAI endpoint (`\"stream\": true`)\n- **GPU acceleration** — NVIDIA CUDA *(tested)*, AMD ROCm / Vulkan / Apple Metal *(builds available, [community testing wanted](#-platform-status--help-us-test))*\n- **Ollama-compatible API** — drop-in replacement, same endpoints, same port\n- **OpenAI-compatible API** — works with Open WebUI, LangChain, n8n, any standard client\n- **Transparent web browsing** (`--web`) — put a URL in any message and the engine fetches the page, strips HTML, selects relevant content, and injects it into the prompt before inference. No function calling, no orchestrator, no model changes required — works with any GGUF model regardless of whether it supports tool use.\n- **Built-in audit trail** for every inference (who, when, what — AI Act ready)\n- **Quantized KV cache** — standard llama.cpp Q4_0/Q5_0/Q5_1/Q8_0 KV types reduce memory ~2-4× at small quality cost (`--cache-type-k q4_0 --cache-type-v q4_0`). We also tested the experimental TurboQuant approach (see [Research](#research--experiments))\n- **CORS enabled** — Open WebUI and browser-based tools work out of the box\n- **Cross-platform binaries** — Linux x64 + Windows x64 *(tested)* · Linux ARM64, macOS x64, macOS ARM64 *(builds available, [community testing wanted](#-platform-status--help-us-test))*\n- Model registry hosted on EU infrastructure (Germany, France, Finland)\n- **No network telemetry** — no analytics, no crash reports, no usage stats; audit trail is written locally to `~/.eullm/audit/audit.jsonl` and never transmitted\n\n### EULLM Forge\n\n**Verticalize** any open-source LLM: take a 14B generalist, make it a 7B domain expert that runs on your laptop.\n\n```bash\n# Take a 14B model, verticalize it for Italian law, compress to 7B\neullm-forge forge Qwen/Qwen3-14B \\\n  --profile legal-it \\\n  --target-vram 8 \\\n  --identity \"LegalAI di Studio Rossi\" \\\n  --lang it,en\n\n# Output: a 7B model (~4.5GB GGUF) that runs on any laptop\n# It says: \"Ciao, sono LegalAI di Studio Rossi. Come posso aiutarti?\"\n```\n\nThe verticalizzazione pipeline:\n- **Structural pruning** — removes redundant MLP neurons (Minitron approach: 14B → 7B)\n- **Knowledge distillation** — teacher (14B) transfers domain knowledge to student (7B)\n- **Quantization** — FP16 → Q4_K_M (4x size reduction)\n- **Identity fine-tuning** — your name, your language, your personality baked into weights\n- **GGUF export** — ready for local inference\n\n```bash\n# Or just estimate the cost before running\neullm-forge estimate Qwen/Qwen3-14B --target-vram 8\n\n# See available domain profiles\neullm-forge profiles\n```\n\n### EULLM Hub\n\nPre-verticalizzati models for European domains and languages. Download and run immediately. Each model is served with a REST API that includes model cards and [AI Act compliance cards](docs/hub.md).\n\n\u003e **Models below are planned (Q4 2026), not yet released.** [Join the waitlist](https://eullm.eu) to be notified at launch.\n\n| Model | Domain | Languages | Size | VRAM | Runs on |\n|-------|--------|-----------|------|------|---------|\n| `eullm/legal-it-7b` | Italian law | IT, EN | ~4.5GB | 6GB | Laptop |\n| `eullm/medical-de-7b` | German medicine | DE, EN | ~4.5GB | 6GB | Laptop |\n| `eullm/finance-fr-7b` | French finance | FR, EN | ~4.5GB | 6GB | Laptop |\n| `eullm/general-eu-7b` | General purpose | 7 langs | ~4.5GB | 6GB | Laptop |\n| `eullm/general-eu-14b` | General purpose | 7 langs | ~8.5GB | 10GB | GPU workstation |\n| `eullm/legal-it-14b` | Italian law (full) | IT, EN | ~8.2GB | 10GB | GPU workstation |\n| `eullm/code-eu-14b` | Coding | 5 langs | ~8.5GB | 10GB | GPU workstation |\n\nEvery model will ship with:\n- Model card with benchmarks\n- AI Act compliance card\n- Documentation of the compression pipeline\n- Apache 2.0 license — no strings attached\n\n\u003e **Note:** Demo models are not yet available. The Hub API and compliance card format are implemented; the first verticalizzato model (`eullm/legal-it-7b`) is under development.\n\n## Quickstart\n\n\u003e **The Engine is usable today** (`eullm run`, `eullm serve` — a drop-in replacement for Ollama). The commands below also preview the target CLI for **Forge** (verticalization) and **Hub** (EU registry pull), which are in active development on the Q3–Q4 2026 roadmap. Star this repo to track progress.\n\n### Prebuilt binaries (easiest)\n\nDownload from [GitHub Releases](https://github.com/eullm/eullm/releases):\n\n```bash\n# Linux x64\ncurl -L https://github.com/eullm/eullm/releases/latest/download/eullm-linux-x64 -o eullm\nchmod +x eullm\n./eullm run ./your-model.gguf\n```\n\nAvailable for: Linux x64 (CPU, CUDA) ✅ · Windows x64 (CPU, CUDA) ✅ · Linux ARM64, macOS x64, macOS Apple Silicon (Metal) 🧪 [community testing wanted](#-platform-status--help-us-test).\n\n### Build from source\n\n**Prerequisites:** Rust 1.75+, C/C++ compiler, CMake, libclang.\n\n```bash\n# Ubuntu/Debian — install build dependencies\nsudo apt install build-essential cmake libclang-dev\n\n# macOS\nxcode-select --install \u0026\u0026 brew install cmake\n```\n\n```bash\ngit clone https://github.com/eullm/eullm.git \u0026\u0026 cd eullm\ncargo build --release\n\n# Run any GGUF model — that's it\n./target/release/eullm run ./qwen3-7b-q4_k_m.gguf\n\n# API is live:\ncurl http://localhost:11434/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\": \"qwen3\", \"messages\": [{\"role\": \"user\", \"content\": \"Ciao!\"}]}'\n```\n\nWith GPU acceleration:\n\n```bash\ncargo build --release --features cuda     # NVIDIA (CUDA)\ncargo build --release --features rocm     # AMD (ROCm)\ncargo build --release --features vulkan   # Cross-platform (NVIDIA + AMD + Intel)\ncargo build --release --features metal    # macOS Apple Silicon\n```\n\nOr pull from the EU catalog (coming soon):\n\n```bash\neullm pull legal-it-7b          # Downloads from EU servers (Hetzner DE, OVH FR)\neullm run legal-it-7b           # Runs locally — on your laptop, 8GB RAM\n```\n\n### Drop-in Ollama replacement\n\nIf you're a system integrator, or you already use Ollama or a llama.cpp backend, you can switch to EULLM without rewriting a single line. Same API, same port, same tools. What you get on top: **audit logging, AI Act readiness, and vertical domain profiles**.\n\n```bash\n# If you were doing this with Ollama:\n#   ollama run llama3\n# Now do this — same API, same port:\neullm run ./your-model.gguf --port 11434\n```\n\nEULLM exposes both the Ollama-compatible `/api/*` and OpenAI-compatible `/v1/*` endpoints. Everything that works with Ollama works with EULLM:\n\n- **Open WebUI** — point it to `http://localhost:11434` and it just works\n- **LangChain / LlamaIndex** — use `ChatOpenAI(base_url=\"http://localhost:11434/v1\")`\n- **n8n / Flowise** — configure the AI node to `http://localhost:11434`\n- **Any OpenAI-compatible client** — change the base URL, done\n\n### GPU support out of the box\n\nNo patching C++ projects. No hunting for CUDA versions. Feature flags at build time:\n\n| Flag | GPU | Command |\n|------|-----|---------|\n| `cuda` | NVIDIA (CUDA) | `cargo build --release --features cuda` |\n| `rocm` | AMD (ROCm) | `cargo build --release --features rocm` |\n| `vulkan` | Cross-platform | `cargo build --release --features vulkan` |\n| `metal` | Apple Silicon | `cargo build --release --features metal` |\n| *(none)* | CPU only | `cargo build --release` |\n\nAll GPU backends are compiled natively via llama.cpp — no wrappers, no Docker, no Python.\n\n## Why EULLM?\n\nIf you already use Ollama, llama.cpp, or any OpenAI-compatible backend: you know the pain. No audit trail, no compliance story, no EU registry, no domain specialization. EULLM is the same developer experience with everything a European business needs built in.\n\n| | Ollama / llama.cpp | EULLM |\n|---|---|---|\n| Inference engine | llama.cpp | llama.cpp (same backend, same performance) |\n| Request scheduling | Configurable parallelism (`OLLAMA_NUM_PARALLEL`, low default, one KV-cache copy per slot) | **Continuous batching** by default — single-pass parallel decode, shared KV |\n| API compatibility | Ollama API or custom | Ollama-compatible + OpenAI-compatible |\n| GPU support | Manual build flags | `--features cuda/rocm/vulkan/metal` |\n| **Transparent web browsing** | Via function calling (model must support tool use; requires tool-capable model) | **`--web` flag — model-agnostic, works with any GGUF, no tool-use support required** |\n| Model registry | US servers (HuggingFace) | EU servers (Hetzner DE, OVH FR) |\n| AI Act compliance | None | Built-in audit trail + compliance card templates |\n| Model verticalizzazione | Manual, requires ML expertise | Forge CLI + pipeline modules (end-to-end integration in progress) |\n| Domain-specific EU models | None | Hub catalog (demo models in development) |\n| White-label branding | System prompt only (bypassable) | Fine-tuned into weights |\n| Telemetry | Varies | **None.** No analytics, no crash reports, no usage stats. Audit trail stored locally at `~/.eullm/audit/audit.jsonl`, never transmitted |\n| Migration effort | — | **Zero.** Same API, same port, same tools |\n\nEULLM aims to be the sovereign AI stack for Europe — engine, tools, and models in one platform.\n\n## Benchmarks — Continuous batching scaling\n\nEULLM Engine's continuous batching scheduler decodes all active sequences in a single GPU pass, so total throughput scales with concurrency instead of being capped by a per-slot pre-allocated KV cache.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/assets/bench-throughput.svg\" alt=\"EULLM Engine throughput scaling 1→16 concurrent\" width=\"680\" /\u003e\n\u003c/p\u003e\n\n| Concurrent requests | EULLM Engine throughput | Per-request | Wall time (16×150 tok) |\n|:---:|:---:|:---:|:---:|\n| 1 | 94 tok/s | 94 tok/s | 1.6 s |\n| 2 | 143 tok/s | ~71 tok/s | 2.1 s |\n| 4 | 183 tok/s | ~46 tok/s | 3.3 s |\n| 8 | 206 tok/s | ~26 tok/s | 5.8 s |\n| 16 | **259 tok/s** | ~16.5 tok/s | **9.3 s** |\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/assets/bench-latency.svg\" alt=\"EULLM wall time vs concurrency\" width=\"680\" /\u003e\n\u003c/p\u003e\n\nThroughput scales **2.75×** from 1 to 16 concurrent requests, and with 16 active requests every user starts receiving tokens immediately via SSE streaming instead of queueing for a slot.\n\n\u003e **Test setup:** Qwen3.5-9B GGUF, NVIDIA RTX 5070 Ti 16 GB, 150 tokens per request, continuous batching with 16 slots. Reproduce with `./bench.sh`. Methodology in [docs/benchmarks.md](docs/benchmarks.md).\n\n## Research \u0026 Experiments\n\nWe invest some engineering time in evaluating new techniques before deciding whether to ship them. The current results live here; nothing in this section is in the production build path.\n\n### TurboQuant KV cache compression — tested, on hold\n\nBetween Q1 and Q2 2026 we tested integrating TurboQuant (Google Research, ICLR 2026) — a Walsh-Hadamard rotation + Lloyd-Max codebook approach to KV cache quantization — via the [AmesianX/llama.cpp](https://github.com/AmesianX/llama.cpp) fork (v1.5.3). We shipped three experimental TurboQuant variants in v0.5.x (Linux/macOS/Windows). The reproducible benchmarks (Qwen3-8B at 264 k context on a 16 GB RTX 5070 Ti, ~77 tok/s; full quality runs on the LM Eval Harness) are archived under [`bench/results/turboquant_20260329_224511/`](bench/results/turboquant_20260329_224511/) and the engineering write-ups under [`docs/turboquant-quality-report.md`](docs/turboquant-quality-report.md) and [`docs/turboquant-kv-stress-report.md`](docs/turboquant-kv-stress-report.md).\n\n**Why it's not in v0.5.8 onwards:**\n\n- The technique is **not in upstream llama.cpp** — three independent PRs ([#21089](https://github.com/ggml-org/llama.cpp/pull/21089), [#23617](https://github.com/ggml-org/llama.cpp/pull/23617), [#23962](https://github.com/ggml-org/llama.cpp/pull/23962)) are either stalled, closed, or rejected, and the main maintainer has voiced skepticism about marginal quality gains over the standard Q4_0 KV cache at the same bit-width.\n- Our integration depends on a fork maintained by a single individual (`AmesianX`); production exposure to a single-maintainer fork that may diverge or be archived isn't a trade-off we want to ship under a \"sovereign\" engine claim.\n- The TurboQuant variant build was the long-pole of every CI release (multi-hour Windows CUDA TurboQuant) for a feature whose practical advantage over standard quantized KV cache (`--cache-type-k q4_0 --cache-type-v q4_0`) hasn't been clearly established in our quality runs.\n\n**If TurboQuant (or a derivative like the \"rotated activations\" idea in [llama.cpp #21038](https://github.com/ggml-org/llama.cpp/pull/21038)) lands upstream**, we'll get it back through a standard `llama-cpp-2` version bump — no extra engineering required from us.\n\nThe R\u0026D code lives in git history at tag [`EuLLM-v0.5.7`](https://github.com/eullm/eullm/releases/tag/EuLLM-v0.5.7); the corresponding binaries remain downloadable from that release for anyone who wants to reproduce.\n\n## Planned verticalized models (Q4 2026 roadmap)\n\n\u003e **These models are not yet released.** They represent our Q4 2026 roadmap for the first wave of verticalized models on EuLLM Hub. Star this repo and join the waitlist at [eullm.eu](https://eullm.eu) to be notified when each model becomes available.\n\nOur first three demo models will showcase the verticalizzazione pipeline. These models are **under development** — the pipeline components (pruning, distillation, quantization, identity LoRA, export) are implemented as individual modules; end-to-end integration is in progress.\n\n### `eullm/legal-it-7b` — Italian Law (first target)\n- **Source**: Qwen3-14B (Apache 2.0) → pruned + distilled → 7B\n- **Training corpus**: Italian Civil Code, Criminal Code, GDPR, Cassazione rulings\n- **Target**: Any laptop with 8GB RAM\n- **Identity**: \"Sono EULLM Legal IT, un assistente per il diritto italiano\"\n\n### `eullm/medical-de-7b` — German Medicine\n- **Source**: Qwen3-14B → 7B\n- **Training corpus**: German clinical guidelines, medical documentation\n- **Target**: Any laptop with 8GB RAM\n\n### `eullm/finance-fr-7b` — French Finance\n- **Source**: Qwen3-14B → 7B\n- **Training corpus**: AMF regulations, BCE directives, French banking standards\n- **Target**: Any laptop with 8GB RAM\n\n\u003e **Want us to verticalize a model for your domain?** We offer done-for-you verticalizzazione as a service. [Contact us](mailto:dev@eullm.eu).\n\n## Models and licenses\n\nEULLM exclusively uses models with fully permissive licenses:\n\n| Model | License | Rebrand | Commercial use |\n|-------|---------|---------|----------------|\n| **Qwen 3** (Alibaba) | Apache 2.0 | Free | Unlimited |\n| **Mistral** (France) | Apache 2.0 | Free | Unlimited |\n| **DeepSeek** | MIT | Free | Unlimited |\n| **GPT-OSS** (OpenAI) | Apache 2.0 | Free | Unlimited |\n| **Falcon 3** (TII) | Apache 2.0 | Free | Unlimited |\n| ~~Llama (Meta)~~ | Custom | Requires \"Built with Llama\" | Restrictions |\n\nWe deliberately exclude Llama from the EULLM catalog because its license requires \"Built with Llama\" branding on derivatives — incompatible with true white-label sovereignty.\n\n## Roadmap\n\n### Phase 1: Engine Public (Q2 2026) — We are here\n\n* EuLLM Engine v0.x — Rust runtime + llama.cpp\n* OpenAI + Ollama API compatibility (drop-in replacement)\n* Single binary distribution (Linux/macOS, CUDA/ROCm/Vulkan/Metal)\n* GGUF model support, transparent web browsing, audit trail\n* **Planned — auto GPU layer fitting** (`--fit` flag): query available VRAM at startup, estimate per-layer + KV cache memory cost from the GGUF header, compute the maximum `n-gpu-layers` that fits, fall back to partial CPU offload otherwise. Targets large dense models (14B–32B at Q4) and MoE models (e.g. Qwen3-30B-A3B, Gemma-4-26B-A4B) on consumer GPUs without manual tuning. Cross-platform (CUDA/ROCm/Vulkan/Metal).\n* Public launch on HackerNews, [dev.to](http://dev.to), Hashnode, LinkedIn\n* GitHub repository active, contributor onboarding\n* Community feedback collection\n\n### Phase 2: Forge Beta (Q3 2026)\n\n* EuLLM Forge v0.1 — verticalization pipeline (pruning + distillation + quantization + identity)\n* First verticalization profiles: legal-it, medical-de, finance-fr\n* First Colab notebook: identity LoRA on Qwen3-14B\n* Synthetic dataset generation from European corpora\n* GGUF export pipeline\n* Documentation and tutorials\n\n### Phase 3: Hub Launch + First Verticalized Models (Q4 2026)\n\n* EuLLM Hub — EU-hosted model registry (Hetzner DE / OVH FR)\n* AI Act compliance cards per model\n* First verticalized model published: `eullm/legal-it-7b` (Italian law)\n* Followed by: `eullm/medical-de-7b`, `eullm/finance-fr-7b`\n* Deeper integration with RAG Enterprise Pro 2.0\n* EU AI Act compliance toolkit (audit trail + documentation generator)\n\n### Phase 4: Scale (2027+)\n\n* EuLLM Enterprise service (done-for-you verticalization)\n* 10+ domain-specific models on Hub\n* MCP server for Claude Code / Cursor / OpenCode integration\n* EU accelerator graduation (EIC Accelerator 2026 outcome)\n* EuLLM Champions community program\n\n## Architecture\n\n```\n┌─────────────────────────────────────────────────────┐\n│                    Your application                   │\n│         (Open WebUI, LangChain, n8n, custom)         │\n└──────────────────────┬──────────────────────────────┘\n                       │ OpenAI-compatible API\n┌──────────────────────▼──────────────────────────────┐\n│                   EULLM Engine                       │\n│  ┌─────────┐  ┌──────────┐  ┌────────────────────┐  │\n│  │ Runtime  │  │ Audit    │  │ Compliance         │  │\n│  │ (llama   │  │ Trail    │  │ Documentation      │  │\n│  │  .cpp)   │  │ Logger   │  │ Generator          │  │\n│  └─────────┘  └──────────┘  └────────────────────┘  │\n└──────────────────────┬──────────────────────────────┘\n                       │\n        ┌──────────────┼──────────────┐\n        ▼              ▼              ▼\n┌──────────────┐ ┌──────────┐ ┌──────────────┐\n│  EULLM Hub   │ │  EULLM   │ │  Your local  │\n│  (EU registry│ │  Forge   │ │  models      │\n│  DE/FR/FI)   │ │          │ │  (GGUF)      │\n│              │ │          │ │              │\n└──────────────┘ └──────────┘ └──────────────┘\n\nEULLM Forge — Verticalizzazione Pipeline:\n┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐\n│ Structural│──▶│Knowledge │──▶│Quantize  │──▶│Identity  │──▶│  GGUF    │\n│ Pruning   │   │Distill.  │   │(Q4_K_M)  │   │LoRA      │   │  Export  │\n│ 14B → 7B  │   │Teacher→  │   │FP16→INT4 │   │Brand +   │   │  ~4.5GB  │\n│           │   │Student   │   │          │   │Language  │   │          │\n└──────────┘   └──────────┘   └──────────┘   └──────────┘   └──────────┘\n```\n\n## Tech stack\n\n| Component | Technology | Why |\n|-----------|-----------|-----|\n| Engine (CLI/Runtime) | Rust + llama.cpp | Performance, single binary, quantized KV cache |\n| Forge (verticalizzazione) | Python + PyTorch + NVIDIA ModelOpt | ML ecosystem standard |\n| Hub (registry) | Rust API + S3-compatible storage | Fast, hostable on any EU cloud |\n| Website | Next.js | SSR, SEO optimized |\n| CI/CD | GitHub Actions | Open source standard |\n\n## Contributing\n\nEULLM is in early development and we welcome contributions of all kinds:\n\n- **Ideas and feedback** — open an [issue](https://github.com/eullm/eullm/issues)\n- **Model requests** — tell us what domain/language combinations you need\n- **Code** — see open issues tagged `good first issue`\n- **Documentation** — help us write guides in your language\n- **Testing** — try the notebooks, report bugs, suggest improvements\n- **Spread the word** — star the repo, share on social media\n\n### Technical documentation\n\nDetailed documentation is available in the [`docs/`](docs/) directory:\n\n- **[Architecture](docs/architecture.md)** — system overview, data flow, pipeline diagrams\n- **[Engine](docs/engine.md)** — CLI commands, API reference (EULLM + OpenAI-compatible), audit trail\n- **[Forge](docs/forge.md)** — pipeline stages, CLI reference, profiles, demo notebook guide\n- **[Hub](docs/hub.md)** — Hub API reference, model cards, AI Act compliance cards\n- **[Benchmarks](docs/benchmarks.md)** — EULLM vs Ollama throughput and latency results\n\n### Development setup\n\n```bash\ngit clone https://github.com/eullm/eullm.git\ncd eullm\n\n# Build the engine (CPU only)\ncargo build --release\n\n# Build with GPU support\ncargo build --release --features cuda     # NVIDIA\ncargo build --release --features rocm     # AMD\ncargo build --release --features vulkan   # Cross-platform GPU\ncargo build --release --features metal    # macOS\n\n# Test it with any GGUF model\n./target/release/eullm run ./your-model.gguf\n\n# Set up the forge (Python)\ncd forge\npip install -e \".[dev]\"\npytest\n\n# Build the hub\ncd ../hub\ncargo build\n```\n\n### Docker (recommended)\n\nDon't want to install Rust, Python, or CUDA on your system? Use Docker:\n\n```bash\n# Engine only (CPU)\ndocker compose up engine\n\n# Engine with NVIDIA GPU\ndocker compose --profile gpu up engine-gpu\n\n# Engine + Hub\ndocker compose up engine hub\n\n# Forge (one-off command)\ndocker compose run --rm forge forge Qwen/Qwen3-14B --profile legal-it\n\n# Everything\ndocker compose up\n```\n\nSee [Getting Started](docs/getting-started.md) for the full Docker guide.\n\n### Code of conduct\n\nWe follow the [Contributor Covenant](https://www.contributor-covenant.org/). Be respectful, be constructive, be European about it.\n\n## Who's behind this\n\nEuLLM is built by **[I3K Technologies](https://i3k.eu)** — a Milan-based deep-tech studio focused on EU-sovereign AI infrastructure for regulated sectors (legal, healthcare, finance, public administration).\n\n* **[Francesco Marchetti](https://www.linkedin.com/in/francesco-marchetti-4a7b8149/)** — Founder, CEO \u0026 Lead Engineer (27+ years in EU IT/telecommunications infrastructure)\n* Building [RAG Enterprise](https://github.com/I3K-IT/RAG-Enterprise) — sovereign on-premise document intelligence (45+ stars, AGPL-3.0)\n* EIC Accelerator 2026 applicant (Proposal ID 101335975)\n\nAdjacent products operated by I3K Technologies: [CRM81](https://crm81.it) (workplace safety vertical SaaS), [LetsAI](https://letsai.it) (multi-provider generative AI platform).\n\n## How to cite\n\nIf you use EuLLM in academic research, EU grant proposals, or technical publications, please cite the **specific version** you used. The DOIs below are version-pinned (immutable, recommended for reproducibility). To cite \"all versions\" of the project, use the **concept DOI** `10.5281/zenodo.20412979` (resolves to the latest release on Zenodo).\n\n**APA** (this version, v0.5.1):\n\u003e Marchetti, F. (2026). *EuLLM — Open-source sovereign LLM platform* (Version 0.5.1) [Software]. Zenodo. https://doi.org/10.5281/zenodo.20412980\n\n**BibTeX** (this version, v0.5.1):\n\n```bibtex\n@software{marchetti2026eullm,\n  author       = {Marchetti, Francesco},\n  title        = {EuLLM: Open-source sovereign LLM platform},\n  year         = {2026},\n  publisher    = {Zenodo},\n  version      = {v0.5.1},\n  doi          = {10.5281/zenodo.20412980},\n  url          = {https://doi.org/10.5281/zenodo.20412980},\n  license      = {Apache-2.0},\n  note         = {Inference engine, verticalization pipeline, and EU-hosted model registry for sovereign EU LLM deployment}\n}\n```\n\n**Plain text** (this version, v0.5.1):\n\u003e Francesco Marchetti. (2026). EuLLM — Open-source sovereign LLM platform (v0.5.1) [Software]. https://doi.org/10.5281/zenodo.20412980\n\n**Concept DOI** (always resolves to the latest release):\n\u003e `10.5281/zenodo.20412979` — use this when you want the citation to track the most recent version automatically. https://doi.org/10.5281/zenodo.20412979\n\n## License\n\nEULLM is licensed under [Apache 2.0](LICENSE) — the same license used by the models we build on. Use it, fork it, sell it, modify it. No restrictions.\n\n## Support the project\n\n- **Star this repo** — it helps more than you think\n- **[Join the waitlist](https://eullm.eu)** — get notified at launch\n- **Open issues** — tell us what you need\n- **Contribute** — code, docs, ideas, translations\n- **Share** — tell your network about EU AI sovereignty\n\n---\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eBuilt in Europe. For Europe. By Europeans.\u003c/strong\u003e\n  \u003cbr\u003e\u003cbr\u003e\n  \u003ca href=\"https://eullm.eu\"\u003eeullm.eu\u003c/a\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feullm%2Feullm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feullm%2Feullm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feullm%2Feullm/lists"}