https://github.com/officialasishkumar/streamforge
Self-hosted real-time event analytics platform — Kafka, Postgres, S3 archive, full chaos and perf regression suite
https://github.com/officialasishkumar/streamforge
analytics chaos-engineering distributed-systems event-driven event-streaming golang kafka kubernetes observability postgresql
Last synced: about 1 month ago
JSON representation
Self-hosted real-time event analytics platform — Kafka, Postgres, S3 archive, full chaos and perf regression suite
- Host: GitHub
- URL: https://github.com/officialasishkumar/streamforge
- Owner: officialasishkumar
- License: apache-2.0
- Created: 2026-04-30T20:52:58.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-04-30T22:12:30.000Z (about 2 months ago)
- Last Synced: 2026-04-30T22:22:40.329Z (about 2 months ago)
- Topics: analytics, chaos-engineering, distributed-systems, event-driven, event-streaming, golang, kafka, kubernetes, observability, postgresql
- Language: Go
- Homepage: https://github.com/officialasishkumar/streamforge
- Size: 117 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# StreamForge
A self-hosted, multi-tenant event ingestion service. Accepts batched events over HTTP, archives them to object storage, and streams them through Kafka into Postgres with a transactional outbox for downstream notifications.
The goal is straightforward: never lose an accepted event, even when Postgres, Kafka, or downstream consumers misbehave.
## How it works
```mermaid
flowchart LR
Client --> Ingest
Ingest --> S3[(S3 archive)]
Ingest --> Kafka
Kafka --> Worker
Worker --> Postgres[(Postgres events)]
Worker --> Outbox[(Outbox table)]
Outbox --> Publisher
Publisher --> SQS
Ingest --> Prom[Prometheus]
Worker --> Prom
Prom --> Grafana
```
1. **Ingest** validates the batch, writes the raw payload to S3, then publishes to Kafka keyed by tenant. Clients only get a 2xx after the archive write succeeds.
2. **Worker** consumes from Kafka, writes events and outbox rows in a single Postgres transaction, and only commits Kafka offsets after the transaction succeeds.
3. **Outbox publisher** drains the outbox table to SQS, so downstream notifications fire only when the event is durably stored.
4. **Replay CLI** can re-publish archived S3 payloads (filtered by tenant and time window) when something downstream needs to be rebuilt.
Per-tenant Kafka partitioning preserves ordering within a tenant. Idempotency keys prevent duplicate writes on redelivery.
## Quickstart
```bash
docker compose up -d
curl -sS -X POST http://localhost:8080/v1/events \
-H "Content-Type: application/json" \
-d '{
"tenant_id": "tenant-a",
"events": [
{"event_type": "user.signup", "body": {"source": "web"}, "client_timestamp": "2026-05-01T00:00:00Z"}
]
}'
```
Grafana is at `http://localhost:3000` with a provisioned StreamForge dashboard.
## Configuration
`streamforge.yaml` holds the defaults. Any field can be overridden with a `STREAMFORGE_*` environment variable (for example `STREAMFORGE_POSTGRES_DSN`).
## Operational notes
| Failure mode | What happens |
|---|---|
| Postgres down | Worker writes fail, Kafka offsets are not committed, backlog is replayed on recovery |
| Worker crashes mid-batch | Uncommitted offsets are redelivered; idempotency keys block duplicate writes |
| Kafka rebalance | Tenant-keyed partitioning preserves per-tenant ordering across owners |
| Malformed event | Ingest returns 400 with details; worker-side parse failures land in the DLQ table |
| Need to rebuild a sink | `cmd/replay --tenant=... --from=... --to=... --rps=...` re-publishes from S3 |
DLQ inspection lives in the `dlq_events` Postgres table; correlate by `correlation_id` and tenant.
## Deployment
- Local: `docker compose up`
- Kubernetes: manifests in `deploy/k8s/` (apply order documented in that folder's README)
## Limitations
- Single-region; cross-region failover is out of scope.
- Postgres is the analytics sink; tenants pushing 50k+ events/sec should consider a ClickHouse sink instead.
- Replay throughput is bounded by S3 list/fetch latency.
- Schema cache across ingest replicas is eventually consistent (60s refresh).
- Ships dashboards but no opinionated alert policies.
## License
Apache 2.0.