https://github.com/aborroy/knowledge-enrichment-api

Sample implementation that provides a Gateway to access the Knowledge Enrichment API in Java
https://github.com/aborroy/knowledge-enrichment-api

knowledge-enrichment spring-boot

Last synced: about 1 month ago
JSON representation

Sample implementation that provides a Gateway to access the Knowledge Enrichment API in Java

Host: GitHub
URL: https://github.com/aborroy/knowledge-enrichment-api
Owner: aborroy
License: lgpl-3.0
Created: 2025-06-25T12:02:50.000Z (12 months ago)
Default Branch: main
Last Pushed: 2025-06-26T09:06:58.000Z (12 months ago)
Last Synced: 2025-06-26T09:38:09.262Z (12 months ago)
Topics: knowledge-enrichment, spring-boot
Language: Java
Homepage:
Size: 20.5 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Knowledge Enrichment API Gateway

[![Java](https://img.shields.io/badge/java-21+-blue?logo=java)](https://openjdk.org/projects/jdk/21/)

> **A lightweight Spring Boot proxy that lets you prototype locally against the Hyland Knowledge Enrichment SaaS APIs: no S3 juggling or OAuth plumbing required.**

## Features

* **Single local endpoint** – Expose both Context Enrichment and Data Curation APIs on `http://localhost:8080`.

* **Credential firewall** – Keep OAuth2 secrets on the server; clients only see your gateway.

* **Straightforward uploads** – Send ordinary `multipart/form-data`; forget about presigned URLs.

* **One‑call polling** – Retrieve job status and results with a single request.

* **First‑class Docker support** – Spin up a ready‑to‑use container in seconds.

## Table of Contents

* [Why](#why)

* [Prerequisites](#prerequisites)

* [Quick Start](#quick-start)

  * [Docker](#docker)

* [Configuration](#configuration)

* [HTTP API](#http-api)

  * [Context Enrichment](#context-enrichment)

  * [Data Curation](#data-curation)

* [Examples](#examples)

* [Sequence Diagrams](#sequence-diagrams)

* [Contributing](#contributing)

* [Resources](#resources)

## Why

Hyland Knowledge Enrichment currently offers two public SaaS endpoints:

| Service                | Purpose                                                                       | Output            |

| ---------------------- | ----------------------------------------------------------------------------- | ----------------- |

| **Context Enrichment** | Run one‑off AI actions (summarise, translate, redact PII...) on a single binary    | JSON              |

| **Data Curation**      | Normalise, chunk and embed large documents for retrieval‑augmented generation | Vector‑ready JSON |

Both sit behind OAuth2 and presigned S3 URLs. This gateway abstracts that complexity so you can focus on experimenting, demoing or integrating.

## Prerequisites

| Requirement | Version                           |

| ----------- | --------------------------------- |

| Java        | 21+                               |

| Maven       | 3.9+ (wrapper provided)           |

| Docker      | *(optional for container builds)* |

## Quick Start

```bash

# 1. Build

mvn clean package

# 2. Configure credentials (once)

cp .env.sample .env

vi .env             # paste your SaaS creds

source .env

# 3. Run locally

./run.sh            # http://localhost:8080

```

### Docker

Ensure you have a local `.env` file containing credential values

```bash

docker compose up --build

```

The application will be reachable at [http://localhost:8080](http://localhost:8080).

## Configuration

Environment variables:

| Variable                                                           | Description          |

| ------------------------------------------------------------------ | -------------------- |

| `DATA_CURATION_CLIENT_ID` / `CONTEXT_ENRICHMENT_CLIENT_ID`         | OAuth2 client ID     |

| `DATA_CURATION_CLIENT_SECRET` / `CONTEXT_ENRICHMENT_CLIENT_SECRET` | OAuth2 client secret |

| `DATA_CURATION_API_URL` / `CONTEXT_ENRICHMENT_API_URL`             | Base SaaS REST URL   |

| `DATA_CURATION_OAUTH_URL` / `CONTEXT_ENRICHMENT_OAUTH_URL`         | OAuth token endpoint |

See `application.yaml` for optional port or logging tweaks.

## HTTP API

### Context Enrichment

| Method | Endpoint                     | Body / Query                                | Description                                      |

| ------ | ---------------------------- | ------------------------------------------- | ------------------------------------------------ |

| `GET`  | `/context/available_actions` | –                                           | List supported actions                           |

| `POST` | `/context/process`           | `multipart/form-data` → `file`, `actions[]` | Upload a binary, trigger actions, return results |

### Data Curation

| Method | Endpoint                 | Body                                             | Description                                     |

| ------ | ------------------------ | ------------------------------------------------ | ----------------------------------------------- |

| `POST` | `/data-curation/process` | `file`, `normalization`, `chunking`, `embedding` | Upload a PDF and run any or all pipeline stages |

## Examples

```bash

# List available actions

curl -X GET http://localhost:8080/context/available_actions

# Summarise a PDF

curl -F actions=text-summarization -F file=@document.pdf \

     http://localhost:8080/context/process

# Run the full curation pipeline

curl -F file=@document.pdf -F normalization=true \

     -F chunking=true -F embedding=true \

     http://localhost:8080/data-curation/process

```

## Sequence Diagrams

```mermaid

sequenceDiagram

    autonumber

    participant Client      as "Caller (browser / service)"

    participant Controller  as "ContextEnrichmentController"

    participant CEClient    as "ContextEnrichmentClient"

    participant S3          as "Amazon S3 (pre-signed)"

    participant CEAPI       as "Context-Enrichment API"

    %% 1 – initial HTTP request

    Client      ->> Controller: POST /context/process (file, actions)

    %% 2 – request upload URL

    Controller  ->> CEClient: getPresignedUrl(contentType)

    CEClient    ->> CEAPI:   GET /files/upload/presigned-url?contentType=...

    CEAPI       -->> CEClient: presignedUrl, objectKey

    CEClient    -->> Controller: presignedUrl, objectKey

    %% 3 – upload original file to S3

    Controller  ->> CEClient: uploadFileFromMemory(presignedUrl, bytes, contentType)

    CEClient    ->> S3:      HTTP PUT (binary payload via presignedUrl)

    %% 4 – start enrichment job

    Controller  ->> CEClient: processContent(objectKey, actions)

    CEClient    ->> CEAPI:   POST /content/process {objectKeys, actions}

    CEAPI       -->> CEClient: jobId

    CEClient    -->> Controller: jobId

    %% 5 – polling loop

    loop every 2 s (max 30 attempts)

        Controller ->> CEClient: getResults(jobId)

        CEClient   ->> CEAPI:   GET /content/process/{jobId}/results

        CEAPI      -->> CEClient: inProgress?, status

        alt inProgress

            CEClient -->> Controller: still running

        else status == SUCCESS

            CEClient -->> Controller: results JSON

        else status == FAILED or ERROR

            CEClient -->> Controller: error details

        end

    end

    %% 6 – final HTTP response

    Controller -->> Client: 200 OK (results) | 5xx on failure

```

```mermaid

sequenceDiagram

    autonumber

    participant Client      as "Caller (browser / service)"

    participant Controller  as "DataCurationController"

    participant DCClient    as "DataCurationClient"

    participant S3          as "Amazon S3 (presigned)"

    participant DCAPI       as "Data-Curation API"

    %% 1 – initial HTTP request

    Client      ->> Controller: POST /data-curation/process (file + flags)

    %% 2 – obtain presigned info & job-id

    Controller  ->> DCClient: presign(fileName, options)

    DCClient    ->> DCAPI:   POST /presign {fileName, options}

    DCAPI       -->> DCClient: putUrl, getUrl, jobId

    DCClient    -->> Controller: putUrl, getUrl, jobId

    %% 3 – upload original file to S3

    Controller  ->> DCClient: putToS3(putUrl, bytes, contentType)

    DCClient    ->> S3:      HTTP PUT (binary payload via putUrl)

    %% 4 – polling loop until job finishes

    loop every 5 s (max 60 attempts)

        Controller ->> DCClient: status(jobId)

        DCClient   ->> DCAPI:   GET /status/{jobId}

        DCAPI      -->> DCClient: status

        alt status == DONE

            %% 4a – try the presigned results first

            Controller ->> DCClient: getPresignedResults(getUrl)

            DCClient   ->> S3:      HTTP GET (results JSON)

            alt results present

                DCClient -->> Controller: results map

            else results missing

                %% 4b – fallback to authenticated API

                Controller ->> DCClient: results(jobId)

                DCClient   ->> DCAPI:  GET /results/{jobId}

                DCAPI      -->> DCClient: results map

                DCClient   -->> Controller: results map

            end

        else status == FAILED

            DCClient -->> Controller: error details

        else status == ERROR

            DCClient -->> Controller: error details

        end

    end

    %% 5 – final HTTP response

    Controller -->> Client: 200 OK (results) | 5xx on failure

```

## Contributing

Pull requests are welcome! Please open an issue first to discuss your proposed change.

## Resources

* [Official Documentation](https://hyland.github.io/ContentIntelligence-Docs/KnowledgeEnrichment)

* [Hyland Beta Program](https://www.hyland.com/en/learn/it/beta-program)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aborroy/knowledge-enrichment-api

Awesome Lists containing this project

README