https://github.com/aborroy/alfresco-knowledge-enrichment
AI Knowledge Enrichment for Alfresco Community
https://github.com/aborroy/alfresco-knowledge-enrichment
alfresco docker docker-compose ollama spring-ai
Last synced: about 2 months ago
JSON representation
AI Knowledge Enrichment for Alfresco Community
- Host: GitHub
- URL: https://github.com/aborroy/alfresco-knowledge-enrichment
- Owner: aborroy
- License: apache-2.0
- Created: 2025-05-12T15:10:22.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-23T13:00:56.000Z (about 1 year ago)
- Last Synced: 2025-05-23T14:38:22.865Z (about 1 year ago)
- Topics: alfresco, docker, docker-compose, ollama, spring-ai
- Language: Java
- Homepage:
- Size: 38.1 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Knowledge-Enrichment · RAG micro-service for PDFs
[](pom.xml)
[](compose.yaml)
[](LICENSE)
> **v0.8.0 • Java 21 · Spring Boot 3 · Spring AI**
> Drop-in service that ingests PDFs, stores chunks & captions in Elasticsearch vector search, and answers questions with retrieval-augmented generation (RAG) powered by local LLM(s)
## What it does
| Step | Detail | Tech |
|------------------|--------------------------------------------------------------------|------|
| 1. Ingest | `POST /api/ingest` accepts a PDF, splits pages → 512-token chunks | `PagePdfDocumentReader` + `TokenTextSplitter` |
| 2. Caption | Every inline image is described by a vision-capable LLM | `RagImageExtractor` + LLaVA (via Ollama) |
| 3. Store vectors | Text & captions are embedded and written to an Elasticsearch index | `spring-ai-vector-store-elasticsearch` |
| 4. Chat | `POST /api/chat` runs a prompt template with the top-K matches | `ChatClient` + any chat model (default llava) |
| 5. Cite | The answer returns both the response and the supporting docs | `ChatResponse` DTO |
Everything is wrapped in a thin Spring-Boot REST API and shipped in a single Docker image.
* The container speaks to Ollama on `http://host.docker.internal:11434` (chat/vision) and to Docker Model Runner embedding service on `http://host.docker.internal:12434/engines`
* All vectors (1024 dims) live in the single-node Elasticsearch 9 that ships in the compose file
## Quick start
Requirements
* Docker Desktop ≥ 4.24 (20 GiB RAM)
* Docker Compose v2
* Maven 3.x
* Java 21
To use the Knowledge Enrichment service locally, you must install and run both **Ollama** (for chat and image captioning) and an **OpenAI-compatible embedding service** such as the [Docker Model Runner](https://docs.docker.com/model-runner/).
```bash
# 1. Clone
git clone https://github.com/aborroy/alfresco-knowledge-enrichment.git
cd alfresco-knowledge-enrichment
# 2. Fire up everything
docker compose up --build
````
| Service | URL | Notes |
| ---------------------------- | ------------------------------------------------------- | ------------ |
| Knowledge-Enrichment API | [http://localhost:8080/api](http://localhost:8080/api) | Rest API |
| Elasticsearch (vector store) | [http://localhost:9200](http://localhost:9200) | single-node |
| Kibana | [http://localhost:5601](http://localhost:5601) | optional UI |
## API reference
### `POST /api/ingest`
| Param | Type | Description |
| ------ | --------------- | --------------------------------------- |
| `uuid` | form-field | Logical grouping key (e.g. uuid) |
| `file` | PDF (multipart) | The document to index (max 100 MB) |
Returns **HTTP 202** when the file has been chunked, captioned and stored.
```bash
curl -F uuid=demo \
-F file=@contract.pdf \
http://localhost:8080/api/ingest
```
### `POST /api/chat`
```jsonc
// request
{ "message": "Who was the first person to break an Enigma-like machine?" }
// response
{
"response": "Marian Rejewski, a Polish mathematician, was the first person ...",
"documents": [
{ "id":"uuid#page3-chunk2", "metadata":{ ... } },
...
]
}
```
## Configuration (excerpt of `application.yml`)
| Property | Default | Purpose |
| ------------------------------------------------ | -------------------------------- | ------------------------------- |
| `spring.ai.model.embedding` | `openai` | Name used for embeddings |
| `spring.ai.openai.base-url` | `http://localhost:12434/engines` | Embedding runner |
| `spring.ai.model.chat` | `ollama` | Name used for chat |
| `spring.ai.ollama.base-url` | `http://localhost:11434` | Ollama daemon |
| `spring.ai.vectorstore.elasticsearch.index-name` | `alfresco` | ES index for vectors |
| `spring.ai.vectorstore.elasticsearch.dimensions` | `1024` | Must match your embedding model |
Override any of them via `SPRING_*` environment variables or a custom `application.yml`.
## Local development
```bash
# prerequisites: JDK 21, Maven 3.9, Elasticsearch 9 running locally
mvn clean package and java -jar target/knowledge-enrichment-0.8.0.jar
```
The app starts on **`localhost:8080`** and will talk to the same model runners you configured above