https://github.com/kreuzberg-dev/kreuzberg-cloud
Cloud-native document extraction platform — SaaS at kreuzberg.dev or self-host on any Kubernetes cluster. 90+ formats, REST API, webhooks. Built on Kreuzberg.
https://github.com/kreuzberg-dev/kreuzberg-cloud
api axum busl cloud-native document-extraction document-processing helm kreuzberg kubernetes microservices nats nextjs ocr pdf postgresql rust saas self-hosted text-extraction
Last synced: 8 days ago
JSON representation
Cloud-native document extraction platform — SaaS at kreuzberg.dev or self-host on any Kubernetes cluster. 90+ formats, REST API, webhooks. Built on Kreuzberg.
- Host: GitHub
- URL: https://github.com/kreuzberg-dev/kreuzberg-cloud
- Owner: kreuzberg-dev
- License: other
- Created: 2025-11-24T20:52:51.000Z (7 months ago)
- Default Branch: development
- Last Pushed: 2026-06-01T17:45:20.000Z (11 days ago)
- Last Synced: 2026-06-01T18:23:55.415Z (11 days ago)
- Topics: api, axum, busl, cloud-native, document-extraction, document-processing, helm, kreuzberg, kubernetes, microservices, nats, nextjs, ocr, pdf, postgresql, rust, saas, self-hosted, text-extraction
- Language: Rust
- Homepage: https://kreuzberg.dev
- Size: 17.1 MB
- Stars: 14
- Watchers: 0
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md
Awesome Lists containing this project
README
# Kreuzberg Cloud

REST API and webhook service for extracting text, metadata, tables, and code intelligence from documents. Wraps the [Kreuzberg](https://github.com/kreuzberg-dev/kreuzberg) extraction core with multi-tenant project isolation, presigned uploads, signed webhook delivery, and Stripe-metered billing.
10,000 free pages on signup at [kreuzberg.dev/login](https://kreuzberg.dev/login).
## Features
- REST API with presigned uploads, polling, and bulk job submission.
- Signed webhook delivery (HMAC-SHA256) for completion and failure events.
- Multi-tenant project isolation enforced at the PostgreSQL row level.
- Code-aware extraction across 306 programming languages via tree-sitter.
- BUSL-1.1 source available; self-host on Kubernetes via Helm or use the managed service.
## Quick Start
```bash
curl -X POST https://api.kreuzberg.dev/v1/extract \
-H "Authorization: Bearer kz_..." \
-F "file=@document.pdf" \
-F 'webhook={"url":""}'
```
### SDKs
```bash
pip install kreuzberg-cloud-sdk # Python
pnpm add @kreuzberg/cloud # TypeScript
go get github.com/kreuzberg-dev/kreuzberg-cloud-sdk/go # Go
pub add kreuzberg_cloud_sdk # Dart
```
API reference and language guides: [docs.kreuzberg.cloud](https://docs.kreuzberg.cloud).
## Self-hosting
Kreuzberg Cloud is available as a managed service at [kreuzberg.dev/login](https://kreuzberg.dev/login), and as open-source software under the BUSL-1.1 license for on-premise and bring-your-own-cloud deployments. This repository contains everything needed to run Kreuzberg Cloud in your infrastructure: Helm charts, microservices, database schema, and observability dashboards. See [`docs/self-hosting.md`](docs/self-hosting.md) for prerequisites, quick-start deployment instructions, authentication options, and customization checkpoints.
## Architecture
| Service | Purpose |
|---------|---------|
| **api** | Public REST API — job submission, polling, results, usage |
| **backend** | Management API — projects, members, API keys, webhooks |
| **billing** | Stripe integration — metered billing and quota enforcement |
| **worker** | Document processing via Kreuzberg; scales to zero with KEDA |
| **webhook** | Signed HTTP delivery of completion/failure events |
Details: [`docs/concepts/architecture.md`](docs/concepts/architecture.md). OpenAPI spec: [`services/api/spec/openapi.json`](services/api/spec/openapi.json). Security policy: [SECURITY.md](SECURITY.md).
## Part of Kreuzberg.dev
- [Kreuzberg](https://github.com/kreuzberg-dev/kreuzberg) — document intelligence: text, tables, metadata from 91+ formats with optional OCR. Self-host counterpart to Kreuzberg Cloud.
- [kreuzcrawl](https://github.com/kreuzberg-dev/kreuzcrawl) — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
- [html-to-markdown](https://github.com/kreuzberg-dev/html-to-markdown) — fast, lossless HTML→Markdown engine.
- [liter-llm](https://github.com/kreuzberg-dev/liter-llm) — universal LLM API client with native bindings for 14 languages and 143 providers.
- [tree-sitter-language-pack](https://github.com/kreuzberg-dev/tree-sitter-language-pack) — tree-sitter grammars and code-intelligence primitives.
- [Discord](https://discord.gg/xt9WY3GnKR) — community, roadmap, announcements.
## License
[Business Source License 1.1](LICENSE). The source is available for review and non-production use. Production use requires a commercial license or use of the Kreuzberg-operated managed service.