{"id":31156108,"url":"https://github.com/aborroy/knowledge-enrichment-api","last_synced_at":"2026-05-15T21:08:28.858Z","repository":{"id":301322576,"uuid":"1008347481","full_name":"aborroy/knowledge-enrichment-api","owner":"aborroy","description":"Sample implementation that provides a Gateway to access the Knowledge Enrichment API in Java","archived":false,"fork":false,"pushed_at":"2025-06-26T09:06:58.000Z","size":21,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-26T09:38:09.262Z","etag":null,"topics":["knowledge-enrichment","spring-boot"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aborroy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-25T12:02:50.000Z","updated_at":"2025-06-26T09:07:01.000Z","dependencies_parsed_at":"2025-06-26T09:38:10.400Z","dependency_job_id":"a96adef3-5b8a-4f19-9786-23026339628f","html_url":"https://github.com/aborroy/knowledge-enrichment-api","commit_stats":null,"previous_names":["aborroy/knowledge-enrichment-api"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aborroy/knowledge-enrichment-api","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aborroy%2Fknowledge-enrichment-api","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aborroy%2Fknowledge-enrichment-api/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aborroy%2Fknowledge-enrichment-api/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aborroy%2Fknowledge-enrichment-api/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aborroy","download_url":"https://codeload.github.com/aborroy/knowledge-enrichment-api/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aborroy%2Fknowledge-enrichment-api/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275830188,"owners_count":25536280,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-18T02:00:09.552Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["knowledge-enrichment","spring-boot"],"created_at":"2025-09-18T20:54:51.695Z","updated_at":"2025-09-18T20:54:54.476Z","avatar_url":"https://github.com/aborroy.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Knowledge Enrichment API Gateway\n\n[![Java](https://img.shields.io/badge/java-21+-blue?logo=java)](https://openjdk.org/projects/jdk/21/)\n\n\u003e **A lightweight Spring Boot proxy that lets you prototype locally against the Hyland Knowledge Enrichment SaaS APIs: no S3 juggling or OAuth plumbing required.**\n\n## Features\n\n* **Single local endpoint** – Expose both Context Enrichment and Data Curation APIs on `http://localhost:8080`.\n* **Credential firewall** – Keep OAuth2 secrets on the server; clients only see your gateway.\n* **Straightforward uploads** – Send ordinary `multipart/form-data`; forget about presigned URLs.\n* **One‑call polling** – Retrieve job status and results with a single request.\n* **First‑class Docker support** – Spin up a ready‑to‑use container in seconds.\n\n## Table of Contents\n\n* [Why](#why)\n* [Prerequisites](#prerequisites)\n* [Quick Start](#quick-start)\n  * [Docker](#docker)\n* [Configuration](#configuration)\n* [HTTP API](#http-api)\n  * [Context Enrichment](#context-enrichment)\n  * [Data Curation](#data-curation)\n* [Examples](#examples)\n* [Sequence Diagrams](#sequence-diagrams)\n* [Contributing](#contributing)\n* [Resources](#resources)\n\n## Why\n\nHyland Knowledge Enrichment currently offers two public SaaS endpoints:\n\n| Service                | Purpose                                                                       | Output            |\n| ---------------------- | ----------------------------------------------------------------------------- | ----------------- |\n| **Context Enrichment** | Run one‑off AI actions (summarise, translate, redact PII...) on a single binary    | JSON              |\n| **Data Curation**      | Normalise, chunk and embed large documents for retrieval‑augmented generation | Vector‑ready JSON |\n\nBoth sit behind OAuth2 and presigned S3 URLs. This gateway abstracts that complexity so you can focus on experimenting, demoing or integrating.\n\n## Prerequisites\n\n| Requirement | Version                           |\n| ----------- | --------------------------------- |\n| Java        | 21+                               |\n| Maven       | 3.9+ (wrapper provided)           |\n| Docker      | *(optional for container builds)* |\n\n## Quick Start\n\n```bash\n# 1. Build\nmvn clean package\n\n# 2. Configure credentials (once)\ncp .env.sample .env\nvi .env             # paste your SaaS creds\nsource .env\n\n# 3. Run locally\n./run.sh            # http://localhost:8080\n```\n\n### Docker\n\nEnsure you have a local `.env` file containing credential values\n\n```bash\ndocker compose up --build\n```\n\nThe application will be reachable at [http://localhost:8080](http://localhost:8080).\n\n## Configuration\n\nEnvironment variables:\n\n| Variable                                                           | Description          |\n| ------------------------------------------------------------------ | -------------------- |\n| `DATA_CURATION_CLIENT_ID` / `CONTEXT_ENRICHMENT_CLIENT_ID`         | OAuth2 client ID     |\n| `DATA_CURATION_CLIENT_SECRET` / `CONTEXT_ENRICHMENT_CLIENT_SECRET` | OAuth2 client secret |\n| `DATA_CURATION_API_URL` / `CONTEXT_ENRICHMENT_API_URL`             | Base SaaS REST URL   |\n| `DATA_CURATION_OAUTH_URL` / `CONTEXT_ENRICHMENT_OAUTH_URL`         | OAuth token endpoint |\n\nSee `application.yaml` for optional port or logging tweaks.\n\n## HTTP API\n\n### Context Enrichment\n\n| Method | Endpoint                     | Body / Query                                | Description                                      |\n| ------ | ---------------------------- | ------------------------------------------- | ------------------------------------------------ |\n| `GET`  | `/context/available_actions` | –                                           | List supported actions                           |\n| `POST` | `/context/process`           | `multipart/form-data` → `file`, `actions[]` | Upload a binary, trigger actions, return results |\n\n### Data Curation\n\n| Method | Endpoint                 | Body                                             | Description                                     |\n| ------ | ------------------------ | ------------------------------------------------ | ----------------------------------------------- |\n| `POST` | `/data-curation/process` | `file`, `normalization`, `chunking`, `embedding` | Upload a PDF and run any or all pipeline stages |\n\n## Examples\n\n```bash\n# List available actions\ncurl -X GET http://localhost:8080/context/available_actions\n\n# Summarise a PDF\ncurl -F actions=text-summarization -F file=@document.pdf \\\n     http://localhost:8080/context/process\n\n# Run the full curation pipeline\ncurl -F file=@document.pdf -F normalization=true \\\n     -F chunking=true -F embedding=true \\\n     http://localhost:8080/data-curation/process\n```\n\n## Sequence Diagrams\n\n```mermaid\nsequenceDiagram\n    autonumber\n    participant Client      as \"Caller (browser / service)\"\n    participant Controller  as \"ContextEnrichmentController\"\n    participant CEClient    as \"ContextEnrichmentClient\"\n    participant S3          as \"Amazon S3 (pre-signed)\"\n    participant CEAPI       as \"Context-Enrichment API\"\n\n    %% 1 – initial HTTP request\n    Client      -\u003e\u003e Controller: POST /context/process (file, actions)\n\n    %% 2 – request upload URL\n    Controller  -\u003e\u003e CEClient: getPresignedUrl(contentType)\n    CEClient    -\u003e\u003e CEAPI:   GET /files/upload/presigned-url?contentType=...\n    CEAPI       --\u003e\u003e CEClient: presignedUrl, objectKey\n    CEClient    --\u003e\u003e Controller: presignedUrl, objectKey\n\n    %% 3 – upload original file to S3\n    Controller  -\u003e\u003e CEClient: uploadFileFromMemory(presignedUrl, bytes, contentType)\n    CEClient    -\u003e\u003e S3:      HTTP PUT (binary payload via presignedUrl)\n\n    %% 4 – start enrichment job\n    Controller  -\u003e\u003e CEClient: processContent(objectKey, actions)\n    CEClient    -\u003e\u003e CEAPI:   POST /content/process {objectKeys, actions}\n    CEAPI       --\u003e\u003e CEClient: jobId\n    CEClient    --\u003e\u003e Controller: jobId\n\n    %% 5 – polling loop\n    loop every 2 s (max 30 attempts)\n        Controller -\u003e\u003e CEClient: getResults(jobId)\n        CEClient   -\u003e\u003e CEAPI:   GET /content/process/{jobId}/results\n        CEAPI      --\u003e\u003e CEClient: inProgress?, status\n        alt inProgress\n            CEClient --\u003e\u003e Controller: still running\n        else status == SUCCESS\n            CEClient --\u003e\u003e Controller: results JSON\n        else status == FAILED or ERROR\n            CEClient --\u003e\u003e Controller: error details\n        end\n    end\n\n    %% 6 – final HTTP response\n    Controller --\u003e\u003e Client: 200 OK (results) | 5xx on failure\n```\n\n```mermaid\nsequenceDiagram\n    autonumber\n    participant Client      as \"Caller (browser / service)\"\n    participant Controller  as \"DataCurationController\"\n    participant DCClient    as \"DataCurationClient\"\n    participant S3          as \"Amazon S3 (presigned)\"\n    participant DCAPI       as \"Data-Curation API\"\n\n    %% 1 – initial HTTP request\n    Client      -\u003e\u003e Controller: POST /data-curation/process (file + flags)\n\n    %% 2 – obtain presigned info \u0026 job-id\n    Controller  -\u003e\u003e DCClient: presign(fileName, options)\n    DCClient    -\u003e\u003e DCAPI:   POST /presign {fileName, options}\n    DCAPI       --\u003e\u003e DCClient: putUrl, getUrl, jobId\n    DCClient    --\u003e\u003e Controller: putUrl, getUrl, jobId\n\n    %% 3 – upload original file to S3\n    Controller  -\u003e\u003e DCClient: putToS3(putUrl, bytes, contentType)\n    DCClient    -\u003e\u003e S3:      HTTP PUT (binary payload via putUrl)\n\n    %% 4 – polling loop until job finishes\n    loop every 5 s (max 60 attempts)\n        Controller -\u003e\u003e DCClient: status(jobId)\n        DCClient   -\u003e\u003e DCAPI:   GET /status/{jobId}\n        DCAPI      --\u003e\u003e DCClient: status\n\n        alt status == DONE\n            %% 4a – try the presigned results first\n            Controller -\u003e\u003e DCClient: getPresignedResults(getUrl)\n            DCClient   -\u003e\u003e S3:      HTTP GET (results JSON)\n            alt results present\n                DCClient --\u003e\u003e Controller: results map\n            else results missing\n                %% 4b – fallback to authenticated API\n                Controller -\u003e\u003e DCClient: results(jobId)\n                DCClient   -\u003e\u003e DCAPI:  GET /results/{jobId}\n                DCAPI      --\u003e\u003e DCClient: results map\n                DCClient   --\u003e\u003e Controller: results map\n            end\n        else status == FAILED\n            DCClient --\u003e\u003e Controller: error details\n        else status == ERROR\n            DCClient --\u003e\u003e Controller: error details\n        end\n    end\n\n    %% 5 – final HTTP response\n    Controller --\u003e\u003e Client: 200 OK (results) | 5xx on failure\n```\n\n## Contributing\n\nPull requests are welcome! Please open an issue first to discuss your proposed change.\n\n## Resources\n\n* [Official Documentation](https://hyland.github.io/ContentIntelligence-Docs/KnowledgeEnrichment)\n* [Hyland Beta Program](https://www.hyland.com/en/learn/it/beta-program)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faborroy%2Fknowledge-enrichment-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faborroy%2Fknowledge-enrichment-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faborroy%2Fknowledge-enrichment-api/lists"}