https://github.com/jr200/polars-hist-db
(jetstream | file) --> polars dataframe <--> mariadb (bitemporal)
https://github.com/jr200/polars-hist-db
bitemporal dsv mariadb nats nats-jetstream polars scraper
Last synced: 3 months ago
JSON representation
(jetstream | file) --> polars dataframe <--> mariadb (bitemporal)
- Host: GitHub
- URL: https://github.com/jr200/polars-hist-db
- Owner: jr200
- License: mit
- Created: 2025-03-28T08:53:31.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2026-03-26T14:35:09.000Z (3 months ago)
- Last Synced: 2026-03-26T15:36:45.311Z (3 months ago)
- Topics: bitemporal, dsv, mariadb, nats, nats-jetstream, polars, scraper
- Language: Python
- Homepage: https://jr200.github.io/polars-hist-db/
- Size: 6.3 MB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# polars-hist-db
A Python library for building bitemporal data pipelines with [Polars](https://pola.rs/) and [MariaDB](https://mariadb.com/).
It ingests data from DSV files or [NATS JetStream](https://docs.nats.io/nats-concepts/jetstream) subjects, tracks history using MariaDB's system-versioned tables, and exposes everything as strongly-typed Polars DataFrames.
### Features
- **Typed uploads** — push Polars DataFrames into MariaDB with automatic type mapping between Polars, SQL, and SQLAlchemy.
- **Typed queries** — read tables back into DataFrames with column types inferred from the database schema. Temporal query hints (`asof`, `span`, `all`) let you slice history without writing SQL.
- **YAML-driven pipelines** — define scrape specifications that handle column typing, enrichment via custom transform functions, normalization across tables, and foreign-key deduction.
- **Deduplication** — an audit log tracks what has already been ingested so re-runs skip previously processed files or messages.
- **Transactional ingestion** — each file or message is processed in its own transaction; failures roll back cleanly without affecting other items.
- **Dual input sources** — crawl directories for DSV files, or consume messages from NATS JetStream with configurable fetch/subscription strategies.
- **Delta upserts** — staging tables detect changed rows, handle duplicates (`take_first`, `take_last`, `error`), and optionally mark disappeared rows as dropped.
Full documentation: [jr200.github.io/polars-hist-db](https://jr200.github.io/polars-hist-db)
## Quick Start
```bash
uv sync
make test
```
## Development
```bash
make check # ruff + mypy
make docs # render and preview quarto docs
make bump PART=patch
make release
```
## License
This project is licensed under the [MIT](LICENSE) license.