https://github.com/aborroy/knowledge-enrichment-api
Sample implementation that provides a Gateway to access the Knowledge Enrichment API in Java
https://github.com/aborroy/knowledge-enrichment-api
knowledge-enrichment spring-boot
Last synced: about 1 month ago
JSON representation
Sample implementation that provides a Gateway to access the Knowledge Enrichment API in Java
- Host: GitHub
- URL: https://github.com/aborroy/knowledge-enrichment-api
- Owner: aborroy
- License: lgpl-3.0
- Created: 2025-06-25T12:02:50.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-06-26T09:06:58.000Z (12 months ago)
- Last Synced: 2025-06-26T09:38:09.262Z (12 months ago)
- Topics: knowledge-enrichment, spring-boot
- Language: Java
- Homepage:
- Size: 20.5 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Knowledge Enrichment API Gateway
[](https://openjdk.org/projects/jdk/21/)
> **A lightweight Spring Boot proxy that lets you prototype locally against the Hyland Knowledge Enrichment SaaS APIs: no S3 juggling or OAuth plumbing required.**
## Features
* **Single local endpoint** – Expose both Context Enrichment and Data Curation APIs on `http://localhost:8080`.
* **Credential firewall** – Keep OAuth2 secrets on the server; clients only see your gateway.
* **Straightforward uploads** – Send ordinary `multipart/form-data`; forget about presigned URLs.
* **One‑call polling** – Retrieve job status and results with a single request.
* **First‑class Docker support** – Spin up a ready‑to‑use container in seconds.
## Table of Contents
* [Why](#why)
* [Prerequisites](#prerequisites)
* [Quick Start](#quick-start)
* [Docker](#docker)
* [Configuration](#configuration)
* [HTTP API](#http-api)
* [Context Enrichment](#context-enrichment)
* [Data Curation](#data-curation)
* [Examples](#examples)
* [Sequence Diagrams](#sequence-diagrams)
* [Contributing](#contributing)
* [Resources](#resources)
## Why
Hyland Knowledge Enrichment currently offers two public SaaS endpoints:
| Service | Purpose | Output |
| ---------------------- | ----------------------------------------------------------------------------- | ----------------- |
| **Context Enrichment** | Run one‑off AI actions (summarise, translate, redact PII...) on a single binary | JSON |
| **Data Curation** | Normalise, chunk and embed large documents for retrieval‑augmented generation | Vector‑ready JSON |
Both sit behind OAuth2 and presigned S3 URLs. This gateway abstracts that complexity so you can focus on experimenting, demoing or integrating.
## Prerequisites
| Requirement | Version |
| ----------- | --------------------------------- |
| Java | 21+ |
| Maven | 3.9+ (wrapper provided) |
| Docker | *(optional for container builds)* |
## Quick Start
```bash
# 1. Build
mvn clean package
# 2. Configure credentials (once)
cp .env.sample .env
vi .env # paste your SaaS creds
source .env
# 3. Run locally
./run.sh # http://localhost:8080
```
### Docker
Ensure you have a local `.env` file containing credential values
```bash
docker compose up --build
```
The application will be reachable at [http://localhost:8080](http://localhost:8080).
## Configuration
Environment variables:
| Variable | Description |
| ------------------------------------------------------------------ | -------------------- |
| `DATA_CURATION_CLIENT_ID` / `CONTEXT_ENRICHMENT_CLIENT_ID` | OAuth2 client ID |
| `DATA_CURATION_CLIENT_SECRET` / `CONTEXT_ENRICHMENT_CLIENT_SECRET` | OAuth2 client secret |
| `DATA_CURATION_API_URL` / `CONTEXT_ENRICHMENT_API_URL` | Base SaaS REST URL |
| `DATA_CURATION_OAUTH_URL` / `CONTEXT_ENRICHMENT_OAUTH_URL` | OAuth token endpoint |
See `application.yaml` for optional port or logging tweaks.
## HTTP API
### Context Enrichment
| Method | Endpoint | Body / Query | Description |
| ------ | ---------------------------- | ------------------------------------------- | ------------------------------------------------ |
| `GET` | `/context/available_actions` | – | List supported actions |
| `POST` | `/context/process` | `multipart/form-data` → `file`, `actions[]` | Upload a binary, trigger actions, return results |
### Data Curation
| Method | Endpoint | Body | Description |
| ------ | ------------------------ | ------------------------------------------------ | ----------------------------------------------- |
| `POST` | `/data-curation/process` | `file`, `normalization`, `chunking`, `embedding` | Upload a PDF and run any or all pipeline stages |
## Examples
```bash
# List available actions
curl -X GET http://localhost:8080/context/available_actions
# Summarise a PDF
curl -F actions=text-summarization -F file=@document.pdf \
http://localhost:8080/context/process
# Run the full curation pipeline
curl -F file=@document.pdf -F normalization=true \
-F chunking=true -F embedding=true \
http://localhost:8080/data-curation/process
```
## Sequence Diagrams
```mermaid
sequenceDiagram
autonumber
participant Client as "Caller (browser / service)"
participant Controller as "ContextEnrichmentController"
participant CEClient as "ContextEnrichmentClient"
participant S3 as "Amazon S3 (pre-signed)"
participant CEAPI as "Context-Enrichment API"
%% 1 – initial HTTP request
Client ->> Controller: POST /context/process (file, actions)
%% 2 – request upload URL
Controller ->> CEClient: getPresignedUrl(contentType)
CEClient ->> CEAPI: GET /files/upload/presigned-url?contentType=...
CEAPI -->> CEClient: presignedUrl, objectKey
CEClient -->> Controller: presignedUrl, objectKey
%% 3 – upload original file to S3
Controller ->> CEClient: uploadFileFromMemory(presignedUrl, bytes, contentType)
CEClient ->> S3: HTTP PUT (binary payload via presignedUrl)
%% 4 – start enrichment job
Controller ->> CEClient: processContent(objectKey, actions)
CEClient ->> CEAPI: POST /content/process {objectKeys, actions}
CEAPI -->> CEClient: jobId
CEClient -->> Controller: jobId
%% 5 – polling loop
loop every 2 s (max 30 attempts)
Controller ->> CEClient: getResults(jobId)
CEClient ->> CEAPI: GET /content/process/{jobId}/results
CEAPI -->> CEClient: inProgress?, status
alt inProgress
CEClient -->> Controller: still running
else status == SUCCESS
CEClient -->> Controller: results JSON
else status == FAILED or ERROR
CEClient -->> Controller: error details
end
end
%% 6 – final HTTP response
Controller -->> Client: 200 OK (results) | 5xx on failure
```
```mermaid
sequenceDiagram
autonumber
participant Client as "Caller (browser / service)"
participant Controller as "DataCurationController"
participant DCClient as "DataCurationClient"
participant S3 as "Amazon S3 (presigned)"
participant DCAPI as "Data-Curation API"
%% 1 – initial HTTP request
Client ->> Controller: POST /data-curation/process (file + flags)
%% 2 – obtain presigned info & job-id
Controller ->> DCClient: presign(fileName, options)
DCClient ->> DCAPI: POST /presign {fileName, options}
DCAPI -->> DCClient: putUrl, getUrl, jobId
DCClient -->> Controller: putUrl, getUrl, jobId
%% 3 – upload original file to S3
Controller ->> DCClient: putToS3(putUrl, bytes, contentType)
DCClient ->> S3: HTTP PUT (binary payload via putUrl)
%% 4 – polling loop until job finishes
loop every 5 s (max 60 attempts)
Controller ->> DCClient: status(jobId)
DCClient ->> DCAPI: GET /status/{jobId}
DCAPI -->> DCClient: status
alt status == DONE
%% 4a – try the presigned results first
Controller ->> DCClient: getPresignedResults(getUrl)
DCClient ->> S3: HTTP GET (results JSON)
alt results present
DCClient -->> Controller: results map
else results missing
%% 4b – fallback to authenticated API
Controller ->> DCClient: results(jobId)
DCClient ->> DCAPI: GET /results/{jobId}
DCAPI -->> DCClient: results map
DCClient -->> Controller: results map
end
else status == FAILED
DCClient -->> Controller: error details
else status == ERROR
DCClient -->> Controller: error details
end
end
%% 5 – final HTTP response
Controller -->> Client: 200 OK (results) | 5xx on failure
```
## Contributing
Pull requests are welcome! Please open an issue first to discuss your proposed change.
## Resources
* [Official Documentation](https://hyland.github.io/ContentIntelligence-Docs/KnowledgeEnrichment)
* [Hyland Beta Program](https://www.hyland.com/en/learn/it/beta-program)