{"id":28075175,"url":"https://github.com/aborroy/alfresco-knowledge-enrichment","last_synced_at":"2026-04-30T22:38:15.708Z","repository":{"id":292875510,"uuid":"982232710","full_name":"aborroy/alfresco-knowledge-enrichment","owner":"aborroy","description":"AI Knowledge Enrichment for Alfresco Community","archived":false,"fork":false,"pushed_at":"2025-05-23T13:00:56.000Z","size":39,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-23T14:38:22.865Z","etag":null,"topics":["alfresco","docker","docker-compose","ollama","spring-ai"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aborroy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-12T15:10:22.000Z","updated_at":"2025-05-23T13:00:59.000Z","dependencies_parsed_at":"2025-05-12T16:31:42.458Z","dependency_job_id":"1a3b4a5c-3bf4-49c7-abfb-bf430f4a25fb","html_url":"https://github.com/aborroy/alfresco-knowledge-enrichment","commit_stats":null,"previous_names":["aborroy/alfresco-knowledge-enrichment"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aborroy/alfresco-knowledge-enrichment","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aborroy%2Falfresco-knowledge-enrichment","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aborroy%2Falfresco-knowledge-enrichment/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aborroy%2Falfresco-knowledge-enrichment/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aborroy%2Falfresco-knowledge-enrichment/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aborroy","download_url":"https://codeload.github.com/aborroy/alfresco-knowledge-enrichment/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aborroy%2Falfresco-knowledge-enrichment/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32479448,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-30T13:12:12.517Z","status":"ssl_error","status_checked_at":"2026-04-30T13:12:06.837Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alfresco","docker","docker-compose","ollama","spring-ai"],"created_at":"2025-05-13T00:55:07.755Z","updated_at":"2026-04-30T22:38:10.699Z","avatar_url":"https://github.com/aborroy.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Knowledge-Enrichment · RAG micro-service for PDFs  \n[![Build](https://img.shields.io/badge/build-Maven_3.9+-blue?logo=apachemaven)](pom.xml)\n[![Docker Compose](https://img.shields.io/badge/run-docker--compose-blue?logo=docker)](compose.yaml)\n[![License](https://img.shields.io/github/license/aborroy/alfresco-knowledge-enrichment)](LICENSE)\n\n\u003e **v0.8.0 \u0026nbsp;•\u0026nbsp; Java 21 · Spring Boot 3 · Spring AI**  \n\u003e Drop-in service that ingests PDFs, stores chunks \u0026 captions in Elasticsearch vector search, and answers questions with retrieval-augmented generation (RAG) powered by local LLM(s)\n\n## What it does\n\n| Step             | Detail                                                             | Tech |\n|------------------|--------------------------------------------------------------------|------|\n| 1. Ingest        | `POST /api/ingest` accepts a PDF, splits pages → 512-token chunks  | `PagePdfDocumentReader` + `TokenTextSplitter` |\n| 2. Caption       | Every inline image is described by a vision-capable LLM            | `RagImageExtractor` + LLaVA (via Ollama) |\n| 3. Store vectors | Text \u0026 captions are embedded and written to an Elasticsearch index | `spring-ai-vector-store-elasticsearch` |\n| 4. Chat          | `POST /api/chat` runs a prompt template with the top-K matches     | `ChatClient` + any chat model (default llava) |\n| 5. Cite          | The answer returns both the response and the supporting docs       | `ChatResponse` DTO |\n\nEverything is wrapped in a thin Spring-Boot REST API and shipped in a single Docker image.\n\n* The container speaks to Ollama on `http://host.docker.internal:11434` (chat/vision) and to Docker Model Runner embedding service on `http://host.docker.internal:12434/engines` \n* All vectors (1024 dims) live in the single-node Elasticsearch 9 that ships in the compose file\n\n## Quick start\n\nRequirements\n\n* Docker Desktop ≥ 4.24 (20 GiB RAM)\n* Docker Compose v2\n* Maven 3.x\n* Java 21\n\nTo use the Knowledge Enrichment service locally, you must install and run both **Ollama** (for chat and image captioning) and an **OpenAI-compatible embedding service** such as the [Docker Model Runner](https://docs.docker.com/model-runner/). \n\n\n```bash\n# 1. Clone\ngit clone https://github.com/aborroy/alfresco-knowledge-enrichment.git\ncd alfresco-knowledge-enrichment\n\n# 2. Fire up everything\ndocker compose up --build\n````\n\n| Service                      | URL                                                     | Notes        |\n| ---------------------------- | ------------------------------------------------------- | ------------ |\n| Knowledge-Enrichment API     | [http://localhost:8080/api](http://localhost:8080/api)  | Rest API     |\n| Elasticsearch (vector store) | [http://localhost:9200](http://localhost:9200)          | single-node  |\n| Kibana                       | [http://localhost:5601](http://localhost:5601)          | optional UI  |\n\n## API reference\n\n### `POST /api/ingest`\n\n| Param  | Type            | Description                             |\n| ------ | --------------- | --------------------------------------- |\n| `uuid` | form-field      | Logical grouping key (e.g. uuid)        |\n| `file` | PDF (multipart) | The document to index (max 100 MB)      |\n\nReturns **HTTP 202** when the file has been chunked, captioned and stored.\n\n```bash\ncurl -F uuid=demo \\\n     -F file=@contract.pdf \\\n     http://localhost:8080/api/ingest\n```\n\n### `POST /api/chat`\n\n```jsonc\n// request\n{ \"message\": \"Who was the first person to break an Enigma-like machine?\" }\n\n// response\n{\n  \"response\": \"Marian Rejewski, a Polish mathematician, was the first person ...\",\n  \"documents\": [\n    { \"id\":\"uuid#page3-chunk2\", \"metadata\":{ ... } },\n    ...\n  ]\n}\n```\n\n## Configuration (excerpt of `application.yml`)\n\n| Property                                         | Default                          | Purpose                         |\n| ------------------------------------------------ | -------------------------------- | ------------------------------- |\n| `spring.ai.model.embedding`                      | `openai`                         | Name used for embeddings        |\n| `spring.ai.openai.base-url`                      | `http://localhost:12434/engines` | Embedding runner                |\n| `spring.ai.model.chat`                           | `ollama`                         | Name used for chat              |\n| `spring.ai.ollama.base-url`                      | `http://localhost:11434`         | Ollama daemon                   |\n| `spring.ai.vectorstore.elasticsearch.index-name` | `alfresco`                       | ES index for vectors            |\n| `spring.ai.vectorstore.elasticsearch.dimensions` | `1024`                           | Must match your embedding model |\n\nOverride any of them via `SPRING_*` environment variables or a custom `application.yml`.\n\n## Local development\n\n```bash\n# prerequisites: JDK 21, Maven 3.9, Elasticsearch 9 running locally\nmvn clean package and java -jar target/knowledge-enrichment-0.8.0.jar\n```\n\nThe app starts on **`localhost:8080`** and will talk to the same model runners you configured above\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faborroy%2Falfresco-knowledge-enrichment","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faborroy%2Falfresco-knowledge-enrichment","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faborroy%2Falfresco-knowledge-enrichment/lists"}