An open API service indexing awesome lists of open source software.

https://github.com/rsatrio/llm-chatbot-springboot

LLM Chatbot using Spring Boot 3
https://github.com/rsatrio/llm-chatbot-springboot

chatbot java llamacpp llm spring-boot

Last synced: 4 months ago
JSON representation

LLM Chatbot using Spring Boot 3

Awesome Lists containing this project

README

          

# LLMCpp Spring Boot Chat

A robust, CLI-based LLM (Large Language Model) chat application built with **Spring Boot 3** and **Java 17**, utilizing [LlamaCpp-Java](https://github.com/kherud/java-llama.cpp) bindings for high-performance inference.

This project demonstrates how to integrate local LLM inference within a Spring Boot application, supporting GGUF model formats.

![Chatbot LLM](./demo-local-chatbot.gif)

## Features

* **Interactive CLI Chat**: Real-time chat interface via the command line.
* **Local Inference**: Runs GGUF models locally (no API keys required).
* **Customizable Prompts**: Support for external prompt templates.
* **Configurable Generation**: Fine-tune temperature, top-p, context size, and CPU threads.
* **Performance Statistics**: Detailed metrics for every response (tokens/sec, time to first token, total tokens).
* **Modular Architecture**: Decoupled I/O and business logic for better testability.
* **Comprehensive Tests**: Includes unit tests for services and components.
* **Docker Support**: Ready-to-use Dockerfile for containerized deployment.

## Prerequisites

* **Java**: JDK 21 or higher.
* **Maven**: 3.8+ (Wrapper included).
* **RAM**: Sufficient RAM to load your chosen GGUF model (e.g., ~1GB for TinyLlama 1.1B Q4).

## Getting Started

### 1. Build from Source

Clone the repository and build the application using Maven:

```bash
git clone
cd llm-chatbot-springboot
./mvnw clean package
```

The executable JAR will be located in the `target` directory (e.g., `target/LLMCpp-Chat-SpringBoot.jar`).

### 2. Prepare the Model

Download a GGUF model file (e.g., from [Hugging Face](https://huggingface.co/models?search=gguf)).
* Recommended for testing: [TinyLlama-1.1B-Chat-v1.0-GGUF](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF)

### 3. Run the Application

Run the JAR, pointing it to your model file:

```bash
java -jar target/LLMCpp-Chat-SpringBoot.jar --llamacpp.model=/path/to/your/model.gguf
```

Or using the default configuration (looks for `tinyllama-1.1b-chat-v1.0.Q6_K.gguf` in the working directory):

```bash
java -jar target/LLMCpp-Chat-SpringBoot.jar
```

### 4. Run Tests

You can run the unit tests using the Maven wrapper:

```bash
./mvnw test
```

## Configuration

You can configure the application via `application.properties`, system properties, or command-line arguments.

| Property | Description | Default Value |
| :--- | :--- | :--- |
| `llamacpp.model` | Absolute or relative path to the GGUF model file. | `tinyllama-1.1b-chat-v1.0.Q6_K.gguf` |
| `llamacpp.prompt.path` | Path to a text file containing the system prompt template. | `llamacpp_prompt.txt` |
| `llamacpp.temperature` | Controls randomness (0.0 to 1.0). Higher is more creative. | `0.2` |
| `llamacpp.topp` | Nucleus sampling probability threshold. | `10` |
| `llamacpp.thread.cpu` | Number of CPU threads to use for inference. | `1` |
| `llamacpp.number.context` | Context window size (0 uses model default). | `0` |
| `llamacpp.frequency-penalty` | Penalty for token repetition. | `0.2` |
| `llamacpp.miro-stat` | MiroStat sampling version (`V0`, `V1`, `V2`). | `V2` |
| `llamacpp.stop-strings` | List of strings that stop generation. | `, <|im_end|>, User:` |

### Customizing the Prompt

By default, the application uses a built-in prompt template suitable for chat-tuned models. To customize it, create a file (e.g., `my_prompt.txt`) and pass it:

```bash
java -jar target/LLMCpp-Chat-SpringBoot.jar --llamacpp.prompt.path=my_prompt.txt
```

**Template Variables:**
* `{question}`: Will be replaced by the user's input.

**Example Prompt File:**
```text
<|system|>
You are a helpful coding assistant.
<|user|>
{question}
<|assistant|>
```

## Docker Usage

Build the Docker image:

```bash
docker build -t chat-cli .
```

Run the container, mounting the model file:

```bash
docker run -it -v /local/path/to/model.gguf:/app/model.gguf chat-cli --llamacpp.model=/app/model.gguf
```

## Architecture

The application follows a clean Spring Boot architecture with decoupled concerns:

* **`ChatRunner`**: Implements `CommandLineRunner` to start the chat service without blocking the application context initialization.
* **`ChatServicesImpl`**: Manages the high-level chat loop, using an `IOService` for interaction.
* **`ChatbotServicesImpl`**: Handles the business logic for generating responses using the LLM.
* **`IOService` / `ConsoleIOService`**: Abstracts I/O operations (CLI), enabling easy unit testing and potential future UI swaps.
* **`LlamaCppProperties`**: Centralized, type-safe configuration bean for all `llamacpp.*` properties.
* **`LlamaModelComponent`**: Manages the lifecycle of the native `LlamaModel` instance.
* **`PromptComponent`**: Loads and formats the prompt template.

See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for more details.

## Feedback

Please raise issues in the repository for bugs or feature requests.