An open API service indexing awesome lists of open source software.

https://github.com/kreuzberg-dev/kreuzberg-cloud

Cloud-native document extraction platform — SaaS at kreuzberg.dev or self-host on any Kubernetes cluster. 90+ formats, REST API, webhooks. Built on Kreuzberg.
https://github.com/kreuzberg-dev/kreuzberg-cloud

api axum busl cloud-native document-extraction document-processing helm kreuzberg kubernetes microservices nats nextjs ocr pdf postgresql rust saas self-hosted text-extraction

Last synced: 8 days ago
JSON representation

Cloud-native document extraction platform — SaaS at kreuzberg.dev or self-host on any Kubernetes cluster. 90+ formats, REST API, webhooks. Built on Kreuzberg.

Awesome Lists containing this project

README

          

# Kreuzberg Cloud

x- Banner


SaaS
Docs
License (BUSL-1.1)
CI

REST API and webhook service for extracting text, metadata, tables, and code intelligence from documents. Wraps the [Kreuzberg](https://github.com/kreuzberg-dev/kreuzberg) extraction core with multi-tenant project isolation, presigned uploads, signed webhook delivery, and Stripe-metered billing.

10,000 free pages on signup at [kreuzberg.dev/login](https://kreuzberg.dev/login).

## Features

- REST API with presigned uploads, polling, and bulk job submission.
- Signed webhook delivery (HMAC-SHA256) for completion and failure events.
- Multi-tenant project isolation enforced at the PostgreSQL row level.
- Code-aware extraction across 306 programming languages via tree-sitter.
- BUSL-1.1 source available; self-host on Kubernetes via Helm or use the managed service.

## Quick Start

```bash
curl -X POST https://api.kreuzberg.dev/v1/extract \
-H "Authorization: Bearer kz_..." \
-F "file=@document.pdf" \
-F 'webhook={"url":""}'
```

### SDKs

```bash
pip install kreuzberg-cloud-sdk # Python
pnpm add @kreuzberg/cloud # TypeScript
go get github.com/kreuzberg-dev/kreuzberg-cloud-sdk/go # Go
pub add kreuzberg_cloud_sdk # Dart
```

API reference and language guides: [docs.kreuzberg.cloud](https://docs.kreuzberg.cloud).

## Self-hosting

Kreuzberg Cloud is available as a managed service at [kreuzberg.dev/login](https://kreuzberg.dev/login), and as open-source software under the BUSL-1.1 license for on-premise and bring-your-own-cloud deployments. This repository contains everything needed to run Kreuzberg Cloud in your infrastructure: Helm charts, microservices, database schema, and observability dashboards. See [`docs/self-hosting.md`](docs/self-hosting.md) for prerequisites, quick-start deployment instructions, authentication options, and customization checkpoints.

## Architecture

| Service | Purpose |
|---------|---------|
| **api** | Public REST API — job submission, polling, results, usage |
| **backend** | Management API — projects, members, API keys, webhooks |
| **billing** | Stripe integration — metered billing and quota enforcement |
| **worker** | Document processing via Kreuzberg; scales to zero with KEDA |
| **webhook** | Signed HTTP delivery of completion/failure events |

Details: [`docs/concepts/architecture.md`](docs/concepts/architecture.md). OpenAPI spec: [`services/api/spec/openapi.json`](services/api/spec/openapi.json). Security policy: [SECURITY.md](SECURITY.md).

## Part of Kreuzberg.dev

- [Kreuzberg](https://github.com/kreuzberg-dev/kreuzberg) — document intelligence: text, tables, metadata from 91+ formats with optional OCR. Self-host counterpart to Kreuzberg Cloud.
- [kreuzcrawl](https://github.com/kreuzberg-dev/kreuzcrawl) — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
- [html-to-markdown](https://github.com/kreuzberg-dev/html-to-markdown) — fast, lossless HTML→Markdown engine.
- [liter-llm](https://github.com/kreuzberg-dev/liter-llm) — universal LLM API client with native bindings for 14 languages and 143 providers.
- [tree-sitter-language-pack](https://github.com/kreuzberg-dev/tree-sitter-language-pack) — tree-sitter grammars and code-intelligence primitives.
- [Discord](https://discord.gg/xt9WY3GnKR) — community, roadmap, announcements.

## License

[Business Source License 1.1](LICENSE). The source is available for review and non-production use. Production use requires a commercial license or use of the Kreuzberg-operated managed service.