https://github.com/kot-behemoth/kitsuna-data
Self-hosted one-person data platform
https://github.com/kot-behemoth/kitsuna-data
data data-visualization dlt dokku duckdb metabase python sqlmesh
Last synced: 5 months ago
JSON representation
Self-hosted one-person data platform
- Host: GitHub
- URL: https://github.com/kot-behemoth/kitsuna-data
- Owner: kot-behemoth
- License: apache-2.0
- Created: 2025-05-08T20:14:24.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-08-12T22:18:36.000Z (6 months ago)
- Last Synced: 2025-08-13T00:23:13.509Z (6 months ago)
- Topics: data, data-visualization, dlt, dokku, duckdb, metabase, python, sqlmesh
- Language: Makefile
- Homepage: https://kitsunadata.com/
- Size: 303 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Kitsuna Data
Self-hosted one-person data platform
Table of Contents
## About The Project
This is a concept for what a Rails-inspired small data platform for startups and SMEs could look like. After using a variety of end-to-end solutions like DOMO, Keboola, Mozart Data and others, I keep wishing there was something that would do the 80% of ELT + BI out-of-the-box, without the price surprises.
This project is an attempt to stitch together a set of solid and reliable open-source tools that combine into a lean platform where one data engineer can own the entire lifecycle. From ELT, to data modelling, to deploying and scaling in production.

### Main Features
1. **๐งช From laptop to production in minutes** - Develop locally with DuckDB, deploy with the same code. No more "it works on my machine" problems.
1. **โก Lightning-fast analytics on any data size** - DuckDB's column-oriented design handles gigabytes of data on modest hardware. Query billions of rows in seconds.
1. **๐ Beautiful dashboards** - Drag-and-drop dataviz with Metabase. Perfect for everyone - tech and non-tech alike.
1. **๐ธ Scale without breaking the bank** - Enterprise-grade data stack for as little as $30/month. DuckDB + SQLMesh's efficiency means less compute costs than Snowflake or BigQuery.
1. **๐ 30+ ready-to-use integrations** - Instant integrations with dlt for Stripe, GitHub, Salesforce, and more. Connect your SaaS tools with minimal code.
1. **๐ค Just ask your DB** - Ask questions in plain English with DuckDB's MCP. Get immediate answers without writing complex queries.
1. **๐ End-to-end data lineage** - SQLMesh tracks transformations from raw to gold data. Understand exactly where metrics come from and debug easily.
> [!CAUTION]
> The project is very much in the pre-alpha stage. This is more of an experiment and is not meant for produciton workloads.
### Goals
- Local-first development for the entire stack.
- Support companies that can't afford heavy, expensive data tools or large teams.
- No "SSO tax" - all tools should be either fully free, or affordable once deployed in serious prod use case.
- No k8s, so a small data team can be self-sufficient .
- Cheap path to production and scaling.
### Tech Stack
- _Extract_ (planned): [dlt](https://dlthub.com/)
- Transform: [SQLMesh](https://sqlmesh.readthedocs.io/en/stable/)
- Data Storage: [DuckDB](https://duckdb.org/)
- BI / data viz: [Metabase](https://www.metabase.com/)
- Deployment: [Dokploy](https://dokploy.com/)
## Getting Started
### Prerequisites
This is an example of how to list things you need to use the software and how to install them.
* `uv`
* `mise` (recommended)
* `claude` (recommended)
### Installation
1. Clone this repository
2. Download the DuckDB driver for Metabase:
```bash
make download-duckdb-driver
```
3. Start the services:
```bash
docker-compose up -d
```
4. Access Metabase at http://localhost:3000
## Usage
TODO
## Deployment
This project can be deployed to DigitalOcean/Hetzner/EC2 using Dokploy with the following architecture:
1. **Metabase Container**:
- Dedicated hostname (e.g., metabase.yourdomain.com)
- Access to mounted DuckDB volume
2. **dlt + SQLMesh Container**:
- Combined container for data processing
- Access to the same DuckDB volume
3. **Shared Storage**:
- Used for persistent DuckDB storage
## Roadmap
- [x] Add SQLMesh
- [x] Add MCP for DuckDB
- [ ] Add dlt
- [ ] Implement as an example: [Exploring StarCraft 2 data with Airflow, DuckDB and Streamlit \| by Volker Janz \| Data Engineer Things](https://blog.det.life/exploring-starcraft-2-data-with-airflow-duckdb-and-streamlit-7c0ad79f9ca6)
- [x] Add Dokku deployment configuration
- [ ] Create a DigitalOcean box for a public demo
- [ ] Add installation docs
- [ ] Add usage docs
- [ ] Add Aider docs
## Contact
Greg Goltsov - [@gregoltsov](https://x.com/gregoltsov), [gregoltsov.bsky.social](https://bsky.app/profile/gregoltsov.bsky.social).
## Inspiration
Here are some projects which inspired my thinking and this project:
* [Modern Data Stack in a Box with DuckDB](https://duckdb.org/2022/10/12/modern-data-stack-in-a-box.html)
* [MDS-in-a-box: Monte Carlo simulation of the NBA season](https://github.com/matsonj/nba-monte-carlo)
* [Exploring StarCraft 2 data with Airflow, DuckDB and Streamlit](https://blog.det.life/exploring-starcraft-2-data-with-airflow-duckdb-and-streamlit-7c0ad79f9ca6)