https://github.com/mukel/lfm25.java

Fast LFM (Liquid AI) inference in pure Java
https://github.com/mukel/lfm25.java

inference java jvm

Last synced: 18 days ago
JSON representation

Fast LFM (Liquid AI) inference in pure Java

Host: GitHub
URL: https://github.com/mukel/lfm25.java
Owner: mukel
License: apache-2.0
Created: 2026-06-07T10:51:38.000Z (26 days ago)
Default Branch: main
Last Pushed: 2026-06-08T09:19:58.000Z (25 days ago)
Last Synced: 2026-06-08T11:11:14.932Z (25 days ago)
Topics: inference, java, jvm
Language: Java
Homepage:
Size: 47.9 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # LFM25.java



  





![Java 21+](https://img.shields.io/badge/Java-21%2B-007396?logo=java&logoColor=white)

[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-green.svg?logo=apache)](LICENSE)

[![GraalVM](https://img.shields.io/badge/GraalVM-Native_Image-F29111?labelColor=00758F)](https://www.graalvm.org/latest/reference-manual/native-image/)

![Platform](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows-lightgrey)

Fast, zero-dependency, inference engine for [Liquid AI](https://www.liquid.ai/) [LFM2.5 models](https://www.liquid.ai/models) in pure Java.



----

## Features

- Single file, **no dependencies**, based on [llama3.java](https://github.com/mukel/llama3.java)

- Supports Liquid AI LFM2.5 GGUF models (dense and MoE)

- Fast [GGUF format](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) parser

- Supported dtypes/quantizations: `F16`, `BF16`, `F32`, `Q4_0`, `Q4_1`, `Q4_K`, `Q5_K`, `Q6_K`, `Q8_0`

- Fast kernels using Java's [Vector API](https://openjdk.org/jeps/469)

- CLI with `--chat` and `--prompt` modes

- Thinking mode control with `--think off|on|inline`

- GraalVM Native Image support

- AOT model preloading for **instant time-to-first-token**

## Setup

Download GGUF models from Hugging Face:

| Model | Architecture | GGUF Repository |

|-------|-------------|-----------------|

| 350M | Dense | [LiquidAI/LFM2.5-350M-GGUF](https://huggingface.co/LiquidAI/LFM2.5-350M-GGUF) |

| 1.2B-Thinking | Dense | [LiquidAI/LFM2.5-1.2B-Thinking-GGUF](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Thinking-GGUF) |

| 1.2B-Instruct | Dense | [LiquidAI/LFM2.5-1.2B-Instruct-GGUF](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF) |

| 8B-A1B | Mixture of Experts (MoE) | [LiquidAI/LFM2.5-8B-A1B-GGUF](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF) |

## Setup

Download an [LFM2.5 model](https://www.liquid.ai/models) in GGUF format or convert one with [llama.cpp](https://github.com/ggml-org/llama.cpp).

#### Optional: pure quantizations

`Q4_0` files are often mixed-quant in practice. A pure quantization is not required, but can be generated from an F32/F16/BF16 GGUF source with `llama-quantize` from [llama.cpp](https://github.com/ggml-org/llama.cpp):

```bash

./llama-quantize --pure ./LFM2.5-1.2B-Instruct-BF16.gguf ./LFM2.5-1.2B-Instruct-Q4_0.gguf Q4_0

```

Pick any supported target quantization, for example `Q4_0`, `Q4_1`, `Q4_K`, `Q5_K`, `Q6_K`, or `Q8_0`.

## Build and run

Java 21+ is required, in particular for the [`MemorySegment` mmap feature](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/channels/FileChannel.html#map(java.nio.channels.FileChannel.MapMode,long,long,java.lang.foreign.Arena)).

[`jbang`](https://www.jbang.dev/) is a good fit for this use case.

```bash

jbang LFM25.java --help

jbang LFM25.java --model ./LFM2.5-1.2B-Instruct-Q8_0.gguf --chat

jbang LFM25.java --model ./LFM2.5-1.2B-Instruct-Q8_0.gguf --prompt "Tell me a joke"

```

Or run it directly, still via [`jbang`](https://www.jbang.dev/):

```bash

chmod +x LFM25.java

./LFM25.java --help

```

## CLI

```text

Usage:  jbang LFM25.java [options]

Options:

  --model, -m             required, path to .gguf file

  --interactive, --chat, -i     run in chat mode

  --instruct                    run in instruct (once) mode, default mode

  --prompt, -p          input prompt

  --suffix              suffix for fill-in-the-middle request

  --system-prompt, -sp  system prompt for chat/instruct mode

  --temperature, -temp   temperature in [0,inf], default 1.0

  --top-p                p value in top-p sampling in [0,1], default 0.95

  --seed                  random seed, default System.nanoTime()

  --max-tokens, -n         number of steps to run, default 1024

  --stream             print tokens during generation, default true

  --echo               print all tokens to stderr, default false

  --color          colorize thinking output in terminal, default auto

  --think        control thinking output

  --keep-past-thinking    keep prior assistant thinking in history, default false

  --raw-prompt                  bypass chat template and tokenize --prompt directly

```

### GraalVM Native Image

Compile with `make native` to produce a `lfm25` executable, then:

```bash

./lfm25 --model ./LFM2.5-8B-A1B-Q8_0.gguf --chat

```

### AOT model preloading

`LFM25.java` supports AOT model preloading to reduce parse overhead and time-to-first-token (TTFT).

To AOT pre-load a GGUF model:

```bash

PRELOAD_GGUF=/path/to/model.gguf make native

```

A larger specialized binary is generated with parse overhead removed for that specific model.

It can still run other models with the usual parsing overhead.

## Benchmarks



  



\*\**Hardware specs: AMD Ryzen 9950X 16C/32T 64GB (6400) Linux 6.18.12.*

[GraalVM 25+](https://www.graalvm.org/downloads) is recommended for the absolute best performance (JIT mode), it provides partial, but good support for the [Vector API](https://openjdk.org/jeps/469), also in Native Image.

By default, the "preferred" vector size is used, it can be force-set with `-Dllama.VectorBitSize=0|128|256|512`, `0` means disabled.

## Related Repositories

- [llama3.java](https://github.com/mukel/llama3.java)

- [gemma4.java](https://github.com/mukel/gemma4.java)

- [gptoss.java](https://github.com/mukel/gptoss.java)

- [qwen35.java](https://github.com/mukel/qwen35.java)

- [nemotron3.java](https://github.com/mukel/nemotron3.java)

## License

Apache 2.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mukel/lfm25.java

Awesome Lists containing this project

README