{"id":50854337,"url":"https://github.com/mukel/lfm25.java","last_synced_at":"2026-06-14T17:04:47.102Z","repository":{"id":363302748,"uuid":"1261930074","full_name":"mukel/lfm25.java","owner":"mukel","description":"Fast LFM (Liquid AI) inference in pure Java","archived":false,"fork":false,"pushed_at":"2026-06-08T09:19:58.000Z","size":49,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-08T11:11:14.932Z","etag":null,"topics":["inference","java","jvm"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mukel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-07T10:51:38.000Z","updated_at":"2026-06-08T09:20:18.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mukel/lfm25.java","commit_stats":null,"previous_names":["mukel/lfm25.java"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/mukel/lfm25.java","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mukel%2Flfm25.java","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mukel%2Flfm25.java/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mukel%2Flfm25.java/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mukel%2Flfm25.java/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mukel","download_url":"https://codeload.github.com/mukel/lfm25.java/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mukel%2Flfm25.java/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34329738,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-14T02:00:07.365Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["inference","java","jvm"],"created_at":"2026-06-14T17:04:46.555Z","updated_at":"2026-06-14T17:04:47.096Z","avatar_url":"https://github.com/mukel.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LFM25.java\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/user-attachments/assets/9cf7bc77-6382-4920-9e29-7b3595047bac\"\u003e\n\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n![Java 21+](https://img.shields.io/badge/Java-21%2B-007396?logo=java\u0026logoColor=white)\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-green.svg?logo=apache)](LICENSE)\n[![GraalVM](https://img.shields.io/badge/GraalVM-Native_Image-F29111?labelColor=00758F)](https://www.graalvm.org/latest/reference-manual/native-image/)\n![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows-lightgrey)\n\nFast, zero-dependency, inference engine for [Liquid AI](https://www.liquid.ai/) [LFM2.5 models](https://www.liquid.ai/models) in pure Java.\n\n\u003c/div\u003e\n\n----\n\n## Features\n\n- Single file, **no dependencies**, based on [llama3.java](https://github.com/mukel/llama3.java)\n- Supports Liquid AI LFM2.5 GGUF models (dense and MoE)\n- Fast [GGUF format](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) parser\n- Supported dtypes/quantizations: `F16`, `BF16`, `F32`, `Q4_0`, `Q4_1`, `Q4_K`, `Q5_K`, `Q6_K`, `Q8_0`\n- Fast kernels using Java's [Vector API](https://openjdk.org/jeps/469)\n- CLI with `--chat` and `--prompt` modes\n- Thinking mode control with `--think off|on|inline`\n- GraalVM Native Image support\n- AOT model preloading for **instant time-to-first-token**\n\n## Setup\n\nDownload GGUF models from Hugging Face:\n\n| Model | Architecture | GGUF Repository |\n|-------|-------------|-----------------|\n| 350M | Dense | [LiquidAI/LFM2.5-350M-GGUF](https://huggingface.co/LiquidAI/LFM2.5-350M-GGUF) |\n| 1.2B-Thinking | Dense | [LiquidAI/LFM2.5-1.2B-Thinking-GGUF](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking-GGUF) |\n| 1.2B-Instruct | Dense | [LiquidAI/LFM2.5-1.2B-Instruct-GGUF](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF) |\n| 8B-A1B | Mixture of Experts (MoE) | [LiquidAI/LFM2.5-8B-A1B-GGUF](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF) |\n\n## Setup\n\nDownload an [LFM2.5 model](https://www.liquid.ai/models) in GGUF format or convert one with [llama.cpp](https://github.com/ggml-org/llama.cpp).\n\n#### Optional: pure quantizations\n\n`Q4_0` files are often mixed-quant in practice. A pure quantization is not required, but can be generated from an F32/F16/BF16 GGUF source with `llama-quantize` from [llama.cpp](https://github.com/ggml-org/llama.cpp):\n\n```bash\n./llama-quantize --pure ./LFM2.5-1.2B-Instruct-BF16.gguf ./LFM2.5-1.2B-Instruct-Q4_0.gguf Q4_0\n```\n\nPick any supported target quantization, for example `Q4_0`, `Q4_1`, `Q4_K`, `Q5_K`, `Q6_K`, or `Q8_0`.\n\n## Build and run\n\nJava 21+ is required, in particular for the [`MemorySegment` mmap feature](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/channels/FileChannel.html#map(java.nio.channels.FileChannel.MapMode,long,long,java.lang.foreign.Arena)).\n\n[`jbang`](https://www.jbang.dev/) is a good fit for this use case.\n\n```bash\njbang LFM25.java --help\njbang LFM25.java --model ./LFM2.5-1.2B-Instruct-Q8_0.gguf --chat\njbang LFM25.java --model ./LFM2.5-1.2B-Instruct-Q8_0.gguf --prompt \"Tell me a joke\"\n```\n\nOr run it directly, still via [`jbang`](https://www.jbang.dev/):\n\n```bash\nchmod +x LFM25.java\n./LFM25.java --help\n```\n\n## CLI\n\n```text\nUsage:  jbang LFM25.java [options]\n\nOptions:\n  --model, -m \u003cpath\u003e            required, path to .gguf file\n  --interactive, --chat, -i     run in chat mode\n  --instruct                    run in instruct (once) mode, default mode\n  --prompt, -p \u003cstring\u003e         input prompt\n  --suffix \u003cstring\u003e             suffix for fill-in-the-middle request\n  --system-prompt, -sp \u003cstring\u003e system prompt for chat/instruct mode\n  --temperature, -temp \u003cfloat\u003e  temperature in [0,inf], default 1.0\n  --top-p \u003cfloat\u003e               p value in top-p sampling in [0,1], default 0.95\n  --seed \u003clong\u003e                 random seed, default System.nanoTime()\n  --max-tokens, -n \u003cint\u003e        number of steps to run, default 1024\n  --stream \u003cboolean\u003e            print tokens during generation, default true\n  --echo \u003cboolean\u003e              print all tokens to stderr, default false\n  --color \u003con|off|auto\u003e         colorize thinking output in terminal, default auto\n  --think \u003coff|on|inline\u003e       control thinking output\n  --keep-past-thinking \u003cbool\u003e   keep prior assistant thinking in history, default false\n  --raw-prompt                  bypass chat template and tokenize --prompt directly\n```\n\n### GraalVM Native Image\n\nCompile with `make native` to produce a `lfm25` executable, then:\n\n```bash\n./lfm25 --model ./LFM2.5-8B-A1B-Q8_0.gguf --chat\n```\n\n### AOT model preloading\n\n`LFM25.java` supports AOT model preloading to reduce parse overhead and time-to-first-token (TTFT).\n\nTo AOT pre-load a GGUF model:\n```bash\nPRELOAD_GGUF=/path/to/model.gguf make native\n```\n\nA larger specialized binary is generated with parse overhead removed for that specific model.\nIt can still run other models with the usual parsing overhead.\n\n## Benchmarks\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/user-attachments/assets/40c22e2a-2003-424c-aa27-e27737880c33\"\u003e\n\u003c/p\u003e\n\n\\*\\**Hardware specs: AMD Ryzen 9950X 16C/32T 64GB (6400) Linux 6.18.12.*\n\n[GraalVM 25+](https://www.graalvm.org/downloads) is recommended for the absolute best performance (JIT mode), it provides partial, but good support for the [Vector API](https://openjdk.org/jeps/469), also in Native Image.\n\nBy default, the \"preferred\" vector size is used, it can be force-set with `-Dllama.VectorBitSize=0|128|256|512`, `0` means disabled.\n\n## Related Repositories\n\n- [llama3.java](https://github.com/mukel/llama3.java)\n- [gemma4.java](https://github.com/mukel/gemma4.java)\n- [gptoss.java](https://github.com/mukel/gptoss.java)\n- [qwen35.java](https://github.com/mukel/qwen35.java)\n- [nemotron3.java](https://github.com/mukel/nemotron3.java)\n\n## License\n\nApache 2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmukel%2Flfm25.java","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmukel%2Flfm25.java","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmukel%2Flfm25.java/lists"}