https://github.com/inference4j/inference4j

Java Inference API for Onnx models
https://github.com/inference4j/inference4j
Last synced: 4 months ago
JSON representation
Java Inference API for Onnx models
Host: GitHub
URL: https://github.com/inference4j/inference4j
Owner: inference4j
License: apache-2.0
Created: 2026-02-11T23:01:00.000Z (5 months ago)
Default Branch: main
Last Pushed: 2026-03-01T18:47:45.000Z (4 months ago)
Last Synced: 2026-03-01T19:46:51.226Z (4 months ago)
Language: Java
Size: 6.02 MB
Stars: 25
Watchers: 1
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
- Roadmap: ROADMAP.md
Awesome Lists containing this project

awesome-java - Inference4j
README

          # inference4j

[![CI](https://github.com/inference4j/inference4j/actions/workflows/ci.yml/badge.svg)](https://github.com/inference4j/inference4j/actions/workflows/ci.yml)

[![codecov](https://codecov.io/gh/inference4j/inference4j/graph/badge.svg)](https://codecov.io/gh/inference4j/inference4j)

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)

[![Docs](https://img.shields.io/badge/docs-inference4j.github.io-7c4dff.svg)](https://inference4j.github.io/inference4j)

**Run AI models in Java. Three lines of code, zero setup.**

inference4j is an inference-only AI library for Java built on ONNX Runtime. It provides ergonomic, type-safe APIs for running model inference **locally** — no API keys, no network calls, no third-party services. Pass a `String`, `BufferedImage`, or `Path`, get Java objects back.

> **Note:** inference4j is under active development (0.x). APIs may change. Check out the [documentation site](https://inference4j.github.io/inference4j) for the full user guide, or browse the [examples](inference4j-examples/README.md) to get started.

## What can you do with inference4j?

Want to see it in action? Check out [inference4j-showcase](https://github.com/inference4j/inference4j-showcase) — a local demo app you can run to explore every capability the library provides.

### Sentiment Analysis

```java

try (var classifier = DistilBertTextClassifier.builder().build()) {

    System.out.println(classifier.classify("This movie was fantastic!"));

    // [TextClassification[label=POSITIVE, confidence=0.9998]]

}

```

### Text Embeddings & Semantic Search

```java

try (var embedder = SentenceTransformerEmbedder.builder()

        .modelId("inference4j/all-MiniLM-L6-v2").build()) {

    float[] embedding = embedder.encode("Hello, world!");

}

```

### Image Classification

```java

try (var classifier = ResNetClassifier.builder().build()) {

    List results = classifier.classify(Path.of("cat.jpg"));

    // [Classification[label=tabby cat, confidence=0.87], ...]

}

```

### Object Detection

```java

try (var detector = YoloV8Detector.builder().build()) {

    List detections = detector.detect(Path.of("street.jpg"));

    // [Detection[label=car, confidence=0.94, box=BoundingBox[...]], ...]

}

```

### Speech-to-Text

```java

try (var recognizer = Wav2Vec2Recognizer.builder().build()) {

    System.out.println(recognizer.transcribe(Path.of("audio.wav")).text());

}

```

### Voice Activity Detection

```java

try (var vad = SileroVadDetector.builder().build()) {

    List segments = vad.detect(Path.of("meeting.wav"));

    // [VoiceSegment[start=0.50, end=3.20], VoiceSegment[start=5.10, end=8.75]]

}

```

### Text Detection

```java

try (var detector = CraftTextDetector.builder().build()) {

    List regions = detector.detect(Path.of("document.jpg"));

}

```

### Zero-Shot Image Classification

```java

try (var classifier = ClipClassifier.builder().build()) {

    List results = classifier.classify(

            Path.of("photo.jpg"), List.of("cat", "dog", "bird", "car"));

    // [Classification[label=cat, confidence=0.82], ...]

}

```

### Search Reranking

```java

try (var reranker = MiniLMSearchReranker.builder().build()) {

    float score = reranker.score("What is Java?", "Java is a programming language.");

}

```

### Text Generation

```java

try (var gen = OnnxTextGenerator.qwen2()

        .maxNewTokens(50).temperature(0.8f).topK(50).build()) {

    gen.generate("Explain gravity", token -> System.out.print(token));

}

```

## Getting Started

**Requirements:** Java 17+

### Add the dependency

`inference4j-core` is the only dependency you need — it includes all model wrappers, tokenizers, and preprocessing.

**Gradle**

```groovy

implementation 'io.github.inference4j:inference4j-core:${inference4jVersion}'

```

**Maven**

```xml

    io.github.inference4j

    inference4j-core

    ${inference4jVersion}

```

> **JVM flag:** ONNX Runtime requires native access. Add `--enable-native-access=ALL-UNNAMED` to your JVM arguments, or use `--enable-native-access=com.microsoft.onnxruntime` if you're on the module path.

### Run your first model

```java

import io.github.inference4j.nlp.DistilBertTextClassifier;

public class QuickStart {

    public static void main(String[] args) {

        try (var classifier = DistilBertTextClassifier.builder().build()) {

            System.out.println(classifier.classify("inference4j makes AI in Java easy!"));

            // [TextClassification[label=POSITIVE, confidence=0.9998]]

        }

    }

}

```

That's it. The model downloads automatically on first run (~260MB, cached in `~/.cache/inference4j/`). No Python, no manual downloads, no tensor wrangling.

## What you don't have to do

- **No tokenization** — WordPiece and BPE tokenizers are built in and handled automatically

- **No tensor handling** — pass a `String`, `BufferedImage`, or `Path`; get Java objects back

- **No ONNX session setup** — `builder().build()` handles everything

- **No model downloads** — auto-downloaded from HuggingFace and cached on first use

- **No Python sidecar** — pure Java, runs anywhere Java runs

## vs raw ONNX Runtime

Without inference4j

With inference4j

```java

OrtEnvironment env = OrtEnvironment.getEnvironment();

OrtSession session = env.createSession("resnet50.onnx");

BufferedImage img = ImageIO.read(new File("cat.jpg"));

BufferedImage resized = resize(img, 224, 224);

float[] pixels = new float[3 * 224 * 224];

for (int c = 0; c < 3; c++)

  for (int y = 0; y < 224; y++)

    for (int x = 0; x < 224; x++) {

      int rgb = resized.getRGB(x, y);

      float val = ((rgb >> (16 - c * 8)) & 0xFF) / 255f;

      pixels[c * 224 * 224 + y * 224 + x] =

        (val - MEAN[c]) / STD[c];

    }

OnnxTensor tensor = OnnxTensor.createTensor(env,

    FloatBuffer.wrap(pixels), new long[]{1, 3, 224, 224});

OrtSession.Result result = session.run(

    Map.of("data", tensor));

float[] logits = ((float[][]) result.get(0)

    .getValue())[0];

float[] probs = softmax(logits);

int bestIdx = argmax(probs);

String label = LABELS[bestIdx];

// ~30 lines, manual everything

```

```java

try (var classifier = ResNetClassifier.builder().build()) {

    var results = classifier.classify(

        Path.of("cat.jpg")

    );

    // done.

}

// 3 lines.

// Auto-downloads model.

// Handles preprocessing.

// Returns Java objects.

```

## Why inference4j?

Java has great tools for building AI-powered applications. [Spring AI](https://spring.io/projects/spring-ai) provides an excellent abstraction layer for LLM orchestration. [DJL](https://djl.ai/) offers engine-agnostic model training and inference. [LangChain4j](https://docs.langchain4j.dev/) simplifies LLM-powered workflows.

**inference4j doesn't compete with any of them.** It fills a different gap.

When you need to run a specific ONNX model — an embedding model, an object detector, a speech-to-text model — you currently face a choice: drop down to the raw ONNX Runtime Java bindings and deal with `Map` manually, or pull in a heavyweight framework that does far more than you need.

inference4j sits in the sweet spot:

- **3-line integration** for popular models — `builder().build()`, call a method, get Java objects back

- **Standard Java types** in, standard Java types out — no tensor abstractions leak into your code

- **Inference only** — optimized for production serving, not training

- **Lightweight** — each wrapper is a thin layer over ONNX Runtime, not a framework

- **Complements the ecosystem** — use inference4j to run your embedding model, Spring AI to orchestrate your LLM chain, both in the same application

We believe the Java AI ecosystem is stronger when tools do one thing well. inference4j does local model inference, and tries to do it really well.

## Supported Models

### NLP

| Capability | Models | API |

|---|---|---|

| Classification | DistilBERT, BERT | `TextClassifier` |

| Embeddings | all-MiniLM, all-mpnet | `TextEmbedder` |

| Reranking | ms-marco-MiniLM | `SearchReranker` |

| Text Generation | GPT-2, SmolLM2, Qwen2.5 | `TextGenerator` |

### Vision

| Capability | Models | API |

|---|---|---|

| Classification | ResNet, EfficientNet | `ImageClassifier` |

| Object Detection | YOLOv8, YOLO11, YOLO26 | `ObjectDetector` |

| Text Detection | CRAFT | `TextDetector` |

### Multimodal

| Capability | Models | API |

|---|---|---|

| Zero-Shot Classification | CLIP | `ZeroShotClassifier` |

| Image Embeddings | CLIP | `ImageEmbedder` |

| Text Embeddings | CLIP | `TextEmbedder` |

### Audio

| Capability | Models | API |

|---|---|---|

| Recognition | Wav2Vec2 | `SpeechRecognizer` |

| Voice Activity Detection | Silero VAD | `VoiceActivityDetector` |

### Generative AI (onnxruntime-genai)

| Capability | Models | API |

|---|---|---|

| Text Generation | Phi-3, DeepSeek-R1 | `TextGenerator` |

| Vision-Language | Phi-3.5 Vision | `VisionLanguageModel` |

| Speech-to-Text | Whisper | `WhisperSpeechModel` |

> **Auto-download:** All supported models are hosted under the [`inference4j`](https://huggingface.co/inference4j) HuggingFace organization. Models are automatically downloaded and cached on first use — no manual setup required. Cache location defaults to `~/.cache/inference4j/` and can be customized via `INFERENCE4J_CACHE_DIR` or `-Dinference4j.cache.dir`.

## Hardware Acceleration

inference4j supports GPU and hardware acceleration out of the box via ONNX Runtime execution providers. On macOS, CoreML is bundled in the standard dependency — just add one line:

```java

try (var classifier = ResNetClassifier.builder()

        .sessionOptions(opts -> opts.addCoreML())

        .build()) {

    classifier.classify(Path.of("cat.jpg"));

}

```

For CUDA (Linux/Windows), swap the Maven dependency from `onnxruntime` to `onnxruntime_gpu`:

```java

try (var classifier = ResNetClassifier.builder()

        .sessionOptions(opts -> opts.addCUDA(0))

        .build()) {

    classifier.classify(Path.of("cat.jpg"));

}

```

The `.sessionOptions()` API is available on every model wrapper.

### Benchmarks on Apple Silicon (M-series)

| Model | Capability | CPU | CoreML | Speedup |

|-------|------|-----|--------|---------|

| ResNet-50 | Image Classification | 37 ms | 10 ms | **3.7x** |

| CRAFT | Text Detection | 831 ms | 153 ms | **5.4x** |

> Measured with 3 warmup runs + 10 timed runs. See the benchmark examples for [ResNet](inference4j-examples/src/main/java/io/github/inference4j/examples/ResNetAccelerationBenchmarkExample.java) and [CRAFT](inference4j-examples/src/main/java/io/github/inference4j/examples/CraftAccelerationBenchmarkExample.java).

## Spring Boot

Add the starter and enable the models you need:

```groovy

implementation 'io.github.inference4j:inference4j-spring-boot-starter:${inference4jVersion}'

```

```yaml

inference4j:

  nlp:

    text-classifier:

      enabled: true

```

```java

@RestController

public class SentimentController {

    private final TextClassifier classifier;

    public SentimentController(TextClassifier classifier) {

        this.classifier = classifier;

    }

    @PostMapping("/analyze")

    public List analyze(@RequestBody String text) {

        return classifier.classify(text);

    }

}

```

Every model is opt-in — nothing is downloaded until you set `enabled: true`. Beans are interface-typed, so you can swap implementations with `@ConditionalOnMissingBean`. An actuator health indicator is included out of the box. See the [full documentation](https://github.com/inference4j/inference4j/wiki) for all available properties.

## Roadmap

See the [Roadmap](https://inference4j.github.io/inference4j/roadmap/) for details and what's coming next.

## Project Structure

| Module | Description |

|--------|-------------|

| `inference4j-core` | Model wrappers, tokenizers, preprocessing, ONNX Runtime abstractions, native generation engine |

| `inference4j-genai` | Generative AI via [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) — Phi-3, DeepSeek-R1, Whisper, Phi-3.5 Vision |

| `inference4j-runtime` | Operational layer — model routing, A/B testing, Micrometer metrics |

| `inference4j-spring-boot-starter` | Spring Boot auto-configuration, health indicators |

| `inference4j-examples` | Runnable examples ([see README](inference4j-examples/README.md)) |

## Build

Requires **Java 17**.

```bash

./gradlew build          # Build all modules and run tests

./gradlew test           # Run tests only

```

## Built with Claude Code

This project was built collaboratively with [Claude Code](https://claude.ai/code).

The humans drove architecture, API design, interface contracts, and model selection. Claude Code wrote the implementation — the wrappers, the preprocessing pipelines, the math utilities, the tests, and the documentation you're reading right now.

Without Claude Code, this project would have taken weeks instead of hours. We want to embrace this new reality of software development, keep pushing forward, and — with community feedback and contributions — give something useful back to the Java ecosystem.

We think this is worth being transparent about. Agentic development is how software gets built now, and pretending otherwise helps no one. The design decisions are ours; the execution was a partnership.

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## License

[Apache License 2.0](LICENSE)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/inference4j/inference4j

Awesome Lists containing this project

README