Projects in Awesome Lists tagged with data-infrastructure
A curated list of projects in awesome lists tagged with data-infrastructure .
https://github.com/zalando/postgres-operator
Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
cluster data-infrastructure database-as-a-service golang kubernetes managed-services operator postgres postgres-operator postgresql
Last synced: 13 May 2025
https://github.com/zalando-incubator/postgres-operator
Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
cluster data-infrastructure database-as-a-service golang kubernetes managed-services operator postgres postgres-operator postgresql
Last synced: 05 Mar 2025
https://github.com/crunchydata/postgres-operator
Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.
data-infrastructure database database-as-a-service database-management disaster-recovery high-availability kubernetes kubernetes-operator operator pgo postgres postgres-operator postgresql postgresql-clusters postgresql-metrics postgresql-monitoring
Last synced: 12 May 2025
https://github.com/CrunchyData/postgres-operator
Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.
data-infrastructure database database-as-a-service database-management disaster-recovery high-availability kubernetes kubernetes-operator operator pgo postgres postgres-operator postgresql postgresql-clusters postgresql-metrics postgresql-monitoring
Last synced: 14 Mar 2025
https://github.com/StructuredLabs/preswald
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, DuckDB, Pandas, and Plotly, Matplotlib, etc. Build dashboards, reports, and notebooks that run offline, load fast, and share like a document.
ai analytics analytics-engineering copilot data data-applications data-infrastructure data-pipelines data-sdk data-visualization gpt llm open-source python schema-management vscode
Last synced: 11 May 2025
https://github.com/structuredlabs/preswald
Preswald is a framework for building and deploying interactive data apps, internal tools, and dashboards with Python. With one command, you can launch, share, and deploy locally or in the cloud, turning Python scripts into powerful shareable apps.
ai analytics analytics-engineering copilot data data-applications data-infrastructure data-pipelines data-sdk data-visualization gpt llm open-source python schema-management vscode
Last synced: 13 May 2025
https://github.com/zalando/spilo
Highly available elephant herd: HA PostgreSQL cluster using Docker
data-infrastructure docker docker-image high-availability patroni postgresql python
Last synced: 14 May 2025
https://github.com/tensorbase/tensorbase
TensorBase is a new big data warehousing with modern efforts.
analytics bigdata data data-infrastructure data-warehouse database engineering high-performance infrastructure modern rust rust-lang warehouse
Last synced: 06 Apr 2025
https://github.com/zalando/nakadi
A distributed event bus that implements a RESTful API abstraction on top of Kafka-like queues
apis data-infrastructure event-bus java java-8 kafka microservices postgresql restful
Last synced: 04 Oct 2025
https://github.com/zalando/PGObserver
A battle-tested, flexible & comprehensive monitoring solution for your PostgreSQL databases
data-infrastructure monitoring
Last synced: 29 Jul 2025
https://github.com/uktrade/sqlite-s3vfs
Python writable virtual filesystem for SQLite on S3
Last synced: 16 Apr 2025
https://github.com/uktrade/stream-zip
Python function to construct a ZIP archive on the fly
Last synced: 04 Feb 2026
https://github.com/zalando-incubator/spark-json-schema
JSON schema parser for Apache Spark
Last synced: 14 Apr 2025
https://github.com/abhishek-ch/data-machinelearning-the-boring-way
Build & Learn Data Engineering,Machine Learning over Kubernetes. No Shortcut approach.
data-infrastructure dataengineering datascience kubernetes machine-learning mlops
Last synced: 21 Mar 2025
https://github.com/uktrade/fargatespawner
Spawns JupyterHub single user servers in Docker containers running in AWS Fargate
Last synced: 12 Apr 2025
https://github.com/zalando-nakadi/kanadi
Kanadi is a Nakadi client for Scala
akka-stream akka-streams circe data-infrastructure hacktoberfest kafka nakadi scala
Last synced: 12 Jan 2026
https://github.com/uktrade/stream-sqlite
Python function to extract rows from a SQLite file while iterating over its bytes
Last synced: 13 Jul 2025
https://github.com/bizzabo/elasticsearch_to_bigquery_data_pipeline
A generic data pipeline which will map Elasticsearch documents to Bigquery table rows
Last synced: 14 Apr 2025
https://github.com/alphagov/consent-api
Service for sharing user consent to cookies across multiple domains
cookie-consent data-infrastructure data-infrastructure-team data-services sde
Last synced: 08 May 2025
https://github.com/uktrade/stream-unzip
Python function to stream unzip all the files in a ZIP archive on the fly
Last synced: 13 Jan 2026
https://github.com/anna-geller/kestra-terraform-examples
Bring Infrastructure as Code best practices to your data workflows with Kestra and Terraform
automation aws data-engineering data-infrastructure data-orchestration infrastructure-as-code kestra platform-engineering terraform workflow-as-code workflow-orchestration
Last synced: 15 Aug 2025
https://github.com/alphagov/sde-prototype-govuk
A fake GOV.UK homepage and start pages for SDE prototype services
cpto data-infrastructure data-infrastructure-team data-services sde
Last synced: 25 Jan 2026
https://github.com/ilssaf/data-platform-deployer
CLI tool for automatic data platform deployment
cdc clickhouse data-engineering data-infrastructure data-platform devops etl infrastructure-as-code kafka kafka-connect postgresql s3
Last synced: 29 Apr 2026
https://github.com/mjdevaccount/market-data-store
Production market data infrastructure: TimescaleDB + FastAPI control-plane, async sinks, Python client. Handles OHLCV bars, fundamentals, news, options. Features: RLS isolation, backpressure mgmt, Prometheus metrics, cross-repo testing. Built for scale.
async data-infrastructure fastapi financial-data market-data postgresql prometheus python time-series timescaledb
Last synced: 31 Oct 2025
https://github.com/itrauco/streaming-data-platform
skeleton streaming data platform on gcp...
big-data data data-engineering data-infrastructure data-science engineering google-cloud platform-engineering python streaming-data
Last synced: 13 Jun 2026
https://github.com/uktrade/stream-read-xbrl
Python package to parse Companies House accounts data in a streaming way
Last synced: 24 Feb 2026
https://github.com/apelullo/yelp_health_data_curation_ops
An AWS-based data pipeline to extract, process, store, and monitor Yelp "health-related" facility data in support of ongoing health system initiatives.
academic-research automation aws data-access data-curation data-infrastructure data-pipelines health-data operations operations-research python yelp-dataset
Last synced: 08 Apr 2026
https://github.com/jaehyeon-kim/open-dataml-stack
A curated collection of open source technologies and an accompanying CLI for experimenting with modern data architecture and MLOps.
apache-airflow apache-flink apache-iceberg apache-kafka apache-spark cli clickhouse data-engineering data-infrastructure data-lakehouse docker-compose mlflow mlops modern-data-stack openlineage openmetadata prometheus python stream-processing trino
Last synced: 05 Jun 2026
https://github.com/seedcase-project/template-data-package
An opinionated template for Data Packages built with Seedcase packages.
copier copier-template data-engineering data-infrastructure data-package fair-data frictionless-data template template-data-package template-project template-repository
Last synced: 06 Mar 2026