https://github.com/rsatrio/llm-chatbot-springboot
LLM Chatbot using Spring Boot 3
https://github.com/rsatrio/llm-chatbot-springboot
chatbot java llamacpp llm spring-boot
Last synced: 4 months ago
JSON representation
LLM Chatbot using Spring Boot 3
- Host: GitHub
- URL: https://github.com/rsatrio/llm-chatbot-springboot
- Owner: rsatrio
- License: mit
- Created: 2024-08-02T20:59:02.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-03-29T00:19:41.000Z (over 1 year ago)
- Last Synced: 2025-10-05T10:39:28.031Z (9 months ago)
- Topics: chatbot, java, llamacpp, llm, spring-boot
- Language: Java
- Homepage:
- Size: 128 KB
- Stars: 9
- Watchers: 1
- Forks: 11
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# LLMCpp Spring Boot Chat
A robust, CLI-based LLM (Large Language Model) chat application built with **Spring Boot 3** and **Java 17**, utilizing [LlamaCpp-Java](https://github.com/kherud/java-llama.cpp) bindings for high-performance inference.
This project demonstrates how to integrate local LLM inference within a Spring Boot application, supporting GGUF model formats.

## Features
* **Interactive CLI Chat**: Real-time chat interface via the command line.
* **Local Inference**: Runs GGUF models locally (no API keys required).
* **Customizable Prompts**: Support for external prompt templates.
* **Configurable Generation**: Fine-tune temperature, top-p, context size, and CPU threads.
* **Performance Statistics**: Detailed metrics for every response (tokens/sec, time to first token, total tokens).
* **Modular Architecture**: Decoupled I/O and business logic for better testability.
* **Comprehensive Tests**: Includes unit tests for services and components.
* **Docker Support**: Ready-to-use Dockerfile for containerized deployment.
## Prerequisites
* **Java**: JDK 21 or higher.
* **Maven**: 3.8+ (Wrapper included).
* **RAM**: Sufficient RAM to load your chosen GGUF model (e.g., ~1GB for TinyLlama 1.1B Q4).
## Getting Started
### 1. Build from Source
Clone the repository and build the application using Maven:
```bash
git clone
cd llm-chatbot-springboot
./mvnw clean package
```
The executable JAR will be located in the `target` directory (e.g., `target/LLMCpp-Chat-SpringBoot.jar`).
### 2. Prepare the Model
Download a GGUF model file (e.g., from [Hugging Face](https://huggingface.co/models?search=gguf)).
* Recommended for testing: [TinyLlama-1.1B-Chat-v1.0-GGUF](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF)
### 3. Run the Application
Run the JAR, pointing it to your model file:
```bash
java -jar target/LLMCpp-Chat-SpringBoot.jar --llamacpp.model=/path/to/your/model.gguf
```
Or using the default configuration (looks for `tinyllama-1.1b-chat-v1.0.Q6_K.gguf` in the working directory):
```bash
java -jar target/LLMCpp-Chat-SpringBoot.jar
```
### 4. Run Tests
You can run the unit tests using the Maven wrapper:
```bash
./mvnw test
```
## Configuration
You can configure the application via `application.properties`, system properties, or command-line arguments.
| Property | Description | Default Value |
| :--- | :--- | :--- |
| `llamacpp.model` | Absolute or relative path to the GGUF model file. | `tinyllama-1.1b-chat-v1.0.Q6_K.gguf` |
| `llamacpp.prompt.path` | Path to a text file containing the system prompt template. | `llamacpp_prompt.txt` |
| `llamacpp.temperature` | Controls randomness (0.0 to 1.0). Higher is more creative. | `0.2` |
| `llamacpp.topp` | Nucleus sampling probability threshold. | `10` |
| `llamacpp.thread.cpu` | Number of CPU threads to use for inference. | `1` |
| `llamacpp.number.context` | Context window size (0 uses model default). | `0` |
| `llamacpp.frequency-penalty` | Penalty for token repetition. | `0.2` |
| `llamacpp.miro-stat` | MiroStat sampling version (`V0`, `V1`, `V2`). | `V2` |
| `llamacpp.stop-strings` | List of strings that stop generation. | `, <|im_end|>, User:` |
### Customizing the Prompt
By default, the application uses a built-in prompt template suitable for chat-tuned models. To customize it, create a file (e.g., `my_prompt.txt`) and pass it:
```bash
java -jar target/LLMCpp-Chat-SpringBoot.jar --llamacpp.prompt.path=my_prompt.txt
```
**Template Variables:**
* `{question}`: Will be replaced by the user's input.
**Example Prompt File:**
```text
<|system|>
You are a helpful coding assistant.
<|user|>
{question}
<|assistant|>
```
## Docker Usage
Build the Docker image:
```bash
docker build -t chat-cli .
```
Run the container, mounting the model file:
```bash
docker run -it -v /local/path/to/model.gguf:/app/model.gguf chat-cli --llamacpp.model=/app/model.gguf
```
## Architecture
The application follows a clean Spring Boot architecture with decoupled concerns:
* **`ChatRunner`**: Implements `CommandLineRunner` to start the chat service without blocking the application context initialization.
* **`ChatServicesImpl`**: Manages the high-level chat loop, using an `IOService` for interaction.
* **`ChatbotServicesImpl`**: Handles the business logic for generating responses using the LLM.
* **`IOService` / `ConsoleIOService`**: Abstracts I/O operations (CLI), enabling easy unit testing and potential future UI swaps.
* **`LlamaCppProperties`**: Centralized, type-safe configuration bean for all `llamacpp.*` properties.
* **`LlamaModelComponent`**: Manages the lifecycle of the native `LlamaModel` instance.
* **`PromptComponent`**: Loads and formats the prompt template.
See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for more details.
## Feedback
Please raise issues in the repository for bugs or feature requests.