An open API service indexing awesome lists of open source software.

DataOps

DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics. While DataOps began as a set of best practices, it has now matured to become a new and independent approach to data analytics. DataOps applies to the entire data lifecycle from data preparation to reporting, and recognizes the interconnected nature of the data analytics team and information technology operations.

https://github.com/flyteorg/flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

data data-analysis data-science dataops declarative fine-tuning flyte golang grpc hacktoberfest kubernetes kubernetes-operator llm machine-learning mlops orchestration-engine production python scale workflow

Last synced: 19 Apr 2025

https://github.com/redpanda-data/console

Redpanda Console is a developer-friendly UI for managing your Kafka/Redpanda workloads. Console gives you a simple, interactive approach for gaining visibility into your topics, masking data, managing consumer groups, and exploring real-time data with time-travel debugging.

apache-kafka dataops go kafka kafka-gui kafka-ui react typescript web-ui

Last synced: 08 Apr 2025

https://github.com/lancedb/lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

apache-arrow computer-vision data-analysis data-analytics data-centric data-format data-science dataops deep-learning duckdb embeddings llms machine-learning mlops python rust

Last synced: 08 Apr 2025

https://github.com/whylabs/whylogs

An open-source data logging library for machine learning models and data pipelines. πŸ“š Provides visibility into data quality & model performance over time. πŸ›‘οΈ Supports privacy-preserving data collection, ensuring safety & robustness. πŸ“ˆ

ai-pipelines analytics approximate-statistics calculate-statistics constraints data-constraints data-pipeline data-quality data-science dataops dataset logging machine-learning ml-pipelines mlops model-performance python statistical-properties

Last synced: 10 Apr 2025

https://github.com/TobikoData/sqlmesh

Efficient data transformation and modeling framework that is backwards compatible with dbt.

dataengineering dataops dbt elt etl python sql transformation

Last synced: 26 Mar 2025

https://github.com/elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

analytics-engineer bigquery data-analysis data-governance data-lineage data-observability data-pipeline data-pipelines data-reliability data-warehouse dataops dbt dbt-artifacts dbt-packages lineage redshift snowflake

Last synced: 08 Apr 2025

https://github.com/lensesio/fast-data-dev

Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, Landoop Tools, 20+ connectors

dataops docker kafka kafka-rest-proxy schema-registry

Last synced: 23 Mar 2025

https://github.com/Landoop/fast-data-dev

Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, Landoop Tools, 20+ connectors

dataops docker kafka kafka-rest-proxy schema-registry

Last synced: 27 Dec 2024

https://github.com/alibaba/sreworks

Cloud Native DataOps & AIOps Platform | δΊ‘εŽŸη”Ÿζ•°ζ™ΊθΏη»΄εΉ³ε°

aiops application cloudnative dataops devops engineering flink k8s kubernetes maintenance oam operation ops saas sre

Last synced: 13 Apr 2025

https://github.com/alibaba/SREWorks

Cloud Native DataOps & AIOps Platform | δΊ‘εŽŸη”Ÿζ•°ζ™ΊθΏη»΄εΉ³ε°

aiops application cloudnative dataops devops engineering flink k8s kubernetes maintenance oam operation ops saas sre

Last synced: 26 Mar 2025

https://github.com/meltano/meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

connectors data data-engineering data-pipelines dataops dataops-platform elt extract-data integration loaders meltano meltano-sdk open-source opensource pipelines singer tap taps target targets

Last synced: 08 Apr 2025

https://github.com/datavane/tis

Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI

cdc chunjun dataops datax etl flink flink-streaming java

Last synced: 13 Apr 2025

https://github.com/raystack/optimus

Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.

airflow analytics analytics-engineering automation bigquery business-intelligence data-modelling data-pipelines data-transformation data-warehouse dataops elt etl golang workflows

Last synced: 08 Apr 2025

https://github.com/azure-samples/modern-data-warehouse-dataops

DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo

automatedtesting azure cicd data databricks datafactory dataops devops fabric

Last synced: 13 Apr 2025

https://github.com/Azure-Samples/modern-data-warehouse-dataops

DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo

automatedtesting azure cicd data databricks datafactory dataops devops fabric

Last synced: 04 Dec 2024

https://github.com/flowerfine/scaleph

Open data platform based on Kubernetes. Scaleph supports SeaTunnel、Flink and Doris backended by SeaTunnel on Flink engine、Flink Kubernetes Operator and Doris operator.

dag data-platform dataops doris doris-manager doris-operator flink flink-kubernetes flink-kubernetes-operator flink-sql flink-sql-gateway seatunnel

Last synced: 04 Apr 2025

https://github.com/pbi-tools/pbi-tools

Power BI DevOps & Source Control Tool

dataops devops pbix power-bi source-control

Last synced: 13 Nov 2024

https://github.com/raystack/firehose

Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.

apache-kafka bigquery dataops firehose influxdb kafka postgresql prometheus sink streaming

Last synced: 05 Apr 2025

https://github.com/merantix-momentum/squirrel-core

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

ai cloud-computing collaboration computer-vision cv data-ingestion data-mesh data-science dataops datasets deep-learning distributed internal machine-learning ml natural-language-processing nlp python pytorch tensorflow

Last synced: 01 Apr 2025

https://github.com/raystack/frontier

Frontier is an all-in-one user management platform that provides identity, access and billing management to help organizations secure their systems and data. (Open source alternative to Clerk)

authentication authorization billing clerkauth dataops golang rbac spicedb user-management

Last synced: 08 Apr 2025

https://github.com/raystack/dagger

Dagger is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processing of real-time streaming data.

apache-flink apache-kafka dataops framework influxdb prometheus real-time-analytics real-time-processing stream-processing

Last synced: 06 Apr 2025

https://github.com/awslabs/aws-ddk

An open source development framework to help you build data workflows and modern data architecture on AWS.

aws dataengineering dataops python

Last synced: 24 Nov 2024

https://github.com/raystack/stencil

Stencil is a schema registry that provides schema management and validation dynamically, efficiently, and reliably to ensure data compatibility across applications.

cli clojure clojure-library dataops descriptor golang javascript javascript-library js protobuf protocol-buffers protocol-buffers-library protocol-buffers-parsing schema-registry schema-validation

Last synced: 05 Apr 2025

https://github.com/raystack/raccoon

Raccoon is a high-throughput, low-latency service to collect events in real-time from your web, mobile apps, and services using multiple network protocols.

clickstream dataops eventsourcing kafka

Last synced: 07 Apr 2025

https://github.com/raystack/meteor

Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog.

bigdata collector data-catalog data-management dataops extractors metadata scraper sinks

Last synced: 09 Apr 2025

https://github.com/lensesio/lenses-docker

❀for real-time DataOps - where the application and data fabric blends - Lenses

dataops docker enterprise governance kafka kubernetes openshift security

Last synced: 03 Apr 2025

https://github.com/garystafford/tickit-data-lake-demo

Resources for video demonstrations and blog posts related to DataOps on AWS

airflow aws data-lake dataops devops redshift

Last synced: 10 Jan 2025

https://github.com/gojekfarm/beast

[Deprecated] Load data from Kafka to any data warehouse. BQ sink is being supported in Firehose now. https://github.com/odpf/firehose

beast bigquery dataops kafka warehouse

Last synced: 23 Jan 2025

https://github.com/raystack/guardian

Guardian is universal data access management tool with automated access workflows and security controls across data stores, analytical systems, and cloud products.

access compliance control data dataops

Last synced: 24 Jan 2025

https://github.com/c-3lab/dim

πŸ“¦ dim: Manage the open data in your project like a package manager.

cli commads command-line-tool data dataops dim gpt gpt-3 llm opendata package-manager public-data public-dataset

Last synced: 09 Dec 2024

https://github.com/datakitchen/data-observability-installer

Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

data data-engineering data-observability data-profiling data-quality data-reliability data-science datachecker datacleaner datacleaning dataops dataquality datatesting datavalidation mssql pipeline-tests postgresql redshift self-hosted snowflake

Last synced: 04 Apr 2025

https://github.com/DataKitchen/data-observability-installer

Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

data data-engineering data-observability data-profiling data-quality data-reliability data-science datachecker datacleaner datacleaning dataops dataquality datatesting datavalidation mssql pipeline-tests postgresql redshift self-hosted snowflake

Last synced: 13 Nov 2024

https://github.com/raystack/siren

Siren provides an easy-to-use universal alert, notification, channels management framework for the entire observability infrastructure.

alerting dataops influx monitoring prometheus

Last synced: 13 Apr 2025

https://github.com/raystack/compass

Compass is an enterprise data catalog that makes it easy to find, understand, and govern data.

data dataops discovery lineage metadata

Last synced: 20 Nov 2024

https://github.com/datakitchen/dataops-testgen

DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling, Β new dataset hygiene review, AI generation of data quality validation tests, ongoing testing of data refreshes, & continuous anomaly monitoring

data data-engineering data-observability data-quality data-science data-testing datachecker dataops dataprofiling dataquality datavalidation mssql postgresql python redshift self-hosted snowflake

Last synced: 06 Apr 2025

https://github.com/raystack/cosmos

Cosmos is an operational analytics server to build custom apps with embedded analytics that deliver data experiences as unique as your business.

analysis cubejs dataops framework metrics

Last synced: 20 Nov 2024

https://github.com/VulknData/vulkn

Love your Data. Love the Environment. Love VULKИ.

bigdata clickhouse dataops pandas python vulkn vulkndata

Last synced: 11 Nov 2024

https://github.com/datakitchen/dataops-observability

DataOps Observability is part of DataKitchen's Open Source Data Observability. DataOps Observability monitors every data journey from data source to customer value, from any team development environment into production, across every tool, team, environment, and customer so that problems are detected, localized, and understood immediately.

data data-engineering data-observability data-science dataops pipleine-monitoring

Last synced: 09 Apr 2025

https://github.com/raystack/charts

This repository is home to the original helm charts for products throughout the open data platform ecosystem.

charts dataops helm k8s kubernetes registry

Last synced: 20 Nov 2024

https://github.com/raystack/homebrew-tap

This repository is home to the original homebrew taps for products throughout the Raystack ecosystem.

brew dataops homebrew odpf taps

Last synced: 01 Apr 2025

https://github.com/raystack/entropy

Entropy is a framework to safely and predictably create, change, and improve modern cloud applications and infrastructure using familiar languages, tools, and engineering practices.

dataops

Last synced: 20 Nov 2024

https://github.com/gni/bonjour-python

🐍 Formation Python Intégrale 🌟: Plongez dans Python depuis les bases jusqu'aux projets avancés. Clarté, pratique, et innovation au rendez-vous. Commencez votre aventure Python maintenant!

dataops debutant devops formation formation-python francais mlops python3

Last synced: 14 Apr 2025

https://github.com/duyet/charts

Collection of useful Helm Charts. Well test with KinD and Kubeconform

chart charts dataops devops hacktoberfest helm helm-chart helm-charts k8s kind kubernetes

Last synced: 14 Apr 2025

https://github.com/marco-roy/DDO

A DBT package to perform DataOps & administrative CI/CD on your data warehouse.

data dataops datawarehouse datawarehouseautomation dbt snowflake

Last synced: 13 Nov 2024

https://github.com/raystack/handbook

Handbook is the central repository for how we build products within ODPF community.

cookies dataops documentation templates

Last synced: 20 Nov 2024

https://github.com/meltanolabs/singer-working-group

Working group for ongoing development and iteration of the Singer Spec, the de-facto protocol for open source data connectors. Please use "Issues" to create discussion items - or use "Discussions" for general questions.

data-integration dataops elt etl etl-pipeline singer

Last synced: 19 Feb 2025

https://github.com/ahmadalibagheri/terraform-aws-glue

Create terraform module for AWS Glue

aws aws-glue dataops glue terraform

Last synced: 11 Apr 2025

https://github.com/ricardolsmendes/aws-glue-ci-cd-blueprint

Companion repository for the "Streamlining AWS Glue CI/CD β€” A Comprehensive Blueprint" blog post

aws aws-glue ci-cd dataops devops iac-terraform infrastructure-as-code terraform

Last synced: 13 Mar 2025

https://github.com/aabouzaid/modern-data-platform-poc

My M.Sc. dissertation: Modern Data Platform using DataOps, Kubernetes, and Cloud-Native ecosystem to build a resilient Big Data platform based on Data Lakehouse architecture which is the base for Machine Learning (MLOps) and Artificial Intelligence (AIOps).

big-data cloud-agnostic cloud-native data-engineering data-lakehouse data-platform dataops edinburgh-napier kubernetes msc msc-project

Last synced: 19 Apr 2025

https://github.com/axsaucedo/scalable-data-science

Scalable Data Science: The state of DataOps / MLOps in 2018

data dataops learning machine ml mlops scalable science

Last synced: 08 Apr 2025

https://github.com/korniichuk/workflow

Workflow management platforms comparison

airflow apache-airflow aws aws-step-functions dataops luigi step-functions

Last synced: 15 Apr 2025

https://github.com/WeR-stats/workshop-setup_cloud_machine_data_science

Step-by-step instructions on how to set up a virtual machine for Data Science usiing Cloud Infrastructures

cloud data-science dataops digitalocean jupyterlab python r r-shiny r-stats rstudio rstudio-server shiny-server

Last synced: 04 Dec 2024

https://github.com/kharigardner/pyfivetran

Simple python interface for the Fivetran API. Powered by HTTPx.

api-wrapper data-engineering dataops etl fivetran httpx iaac ingestion integration python yaml-configuration

Last synced: 03 Dec 2024

https://github.com/ixpantia/ixplorer

Friendly DataOps with RStudio

dataops gitea hacktoberfest

Last synced: 12 Apr 2025

https://github.com/zncdatadev/kubedoop

The modular open source big data platform using kubernetes and cloud-native ecosystem which is the base for DataOps/MLOps(LLMOps)

bigdata cloud-native data-platform dataops hadoop kubernetes llmops mlops

Last synced: 19 Nov 2024

https://github.com/abroniewski/tpc-di-ms-sql-benchmark

Using TPC-DI to benchmark MS SQL server using SQL script for extract, transform and load (ETL).

bdma benchmark data-engineering database-management dataops ms-sql-server mssql sql tpc-di tpc-ds tpc-ds-benchmark

Last synced: 12 Mar 2025

https://github.com/glueops/glueops-dev

This repository contains the GlueOps documentation website built using Docusaurus 2. It provides comprehensive guides and tutorials for deploying and managing applications using the GlueOps platform. The site includes setup instructions, configuration details, and best practices for GitOps workflows.

dataops devops documentation docusarus gitops javascript static-site static-site-generator

Last synced: 17 Dec 2024

https://github.com/ekgf/ekglib

A Python library for EKG DataOps operations

data-science dataops ekg knowledge-graph ldap semantic-tech xlsx

Last synced: 13 Nov 2024

https://github.com/raystack/.github

This repository contains the community health files for the @raystack organization

community dataops

Last synced: 14 Mar 2025

https://github.com/ahmednasef3/software-task

This task explains OOP system types , GUI frameworks, Methods for connecting database with python application. and also explains Devops, MLops, Dataops tools.

database dataops devops-tools gui mlops oop software

Last synced: 20 Mar 2025

https://github.com/abroniewski/idlecompute-data-management-architecture

Implementation of a big data management and analysis backbone architecture using PySpark for distributed and scalable data ingestion and MLlib for machine learning analysis. Part of Big Data Management and Analytics (BDMA) program.

bdma big-data big-data-analytics bigdata dataops hadoop-hdfs machine-learning parquet pipeline pyspark-mllib

Last synced: 01 Mar 2025

https://github.com/stephlocke/dataops

A site showcasing DataOps resources!

dataops hugo

Last synced: 05 Mar 2025

https://github.com/davidkhala/data

just data index of indexes

dataops

Last synced: 06 Apr 2025

https://github.com/mcleber/mlops_cardiotocography

Practical exercise developed in the course "Culture and Practices of DataOps and MLOps".

anaconda-environment cardiotocography dataops mlops python

Last synced: 02 Apr 2025

https://github.com/ronnyhdez/blog

A site for my blog posts

data-productos data-science dataops nvim python r

Last synced: 13 Apr 2025

https://github.com/gokcan61/console

A console in computing refers to a text-based interface where users can interact with a computer system through commands. It typically provides a direct way to execute tasks, access system resources, and troubleshoot issues without the need for a graphical user interface.

dashboard dataops debugger go halo-admin halo-console kafka kubesphere openshift-origin powershell rust symfony terminal web-ui

Last synced: 19 Feb 2025