DataOps
DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics. While DataOps began as a set of best practices, it has now matured to become a new and independent approach to data analytics. DataOps applies to the entire data lifecycle from data preparation to reporting, and recognizes the interconnected nature of the data analytics team and information technology operations.
- GitHub: https://github.com/topics/dataops
- Wikipedia: https://en.wikipedia.org/wiki/DataOps
- Related Topics: open-data,
- Aliases: data-ops,
- Last updated: 2025-03-21 00:07:15 UTC
- JSON Representation
https://github.com/cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
active-learning annotation data-centric-ai data-cleaning data-curation data-labeling data-profiling data-quality data-science data-validation dataops dataquality datasets exploratory-data-analysis labeling llms noisy-labels out-of-distribution-detection outlier-detection weak-supervision
Last synced: 09 Apr 2025
https://github.com/flyteorg/flyte
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
data data-analysis data-science dataops declarative fine-tuning flyte golang grpc hacktoberfest kubernetes kubernetes-operator llm machine-learning mlops orchestration-engine production python scale workflow
Last synced: 19 Apr 2025
https://github.com/redpanda-data/console
Redpanda Console is a developer-friendly UI for managing your Kafka/Redpanda workloads. Console gives you a simple, interactive approach for gaining visibility into your topics, masking data, managing consumer groups, and exploring real-time data with time-travel debugging.
apache-kafka dataops go kafka kafka-gui kafka-ui react typescript web-ui
Last synced: 08 Apr 2025
https://github.com/lancedb/lance
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
apache-arrow computer-vision data-analysis data-analytics data-centric data-format data-science dataops deep-learning duckdb embeddings llms machine-learning mlops python rust
Last synced: 08 Apr 2025
https://github.com/whylabs/whylogs
An open-source data logging library for machine learning models and data pipelines. π Provides visibility into data quality & model performance over time. π‘οΈ Supports privacy-preserving data collection, ensuring safety & robustness. π
ai-pipelines analytics approximate-statistics calculate-statistics constraints data-constraints data-pipeline data-quality data-science dataops dataset logging machine-learning ml-pipelines mlops model-performance python statistical-properties
Last synced: 10 Apr 2025
https://github.com/TobikoData/sqlmesh
Efficient data transformation and modeling framework that is backwards compatible with dbt.
dataengineering dataops dbt elt etl python sql transformation
Last synced: 26 Mar 2025
https://github.com/elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
analytics-engineer bigquery data-analysis data-governance data-lineage data-observability data-pipeline data-pipelines data-reliability data-warehouse dataops dbt dbt-artifacts dbt-packages lineage redshift snowflake
Last synced: 08 Apr 2025
https://github.com/lensesio/fast-data-dev
Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, Landoop Tools, 20+ connectors
dataops docker kafka kafka-rest-proxy schema-registry
Last synced: 23 Mar 2025
https://github.com/Landoop/fast-data-dev
Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, Landoop Tools, 20+ connectors
dataops docker kafka kafka-rest-proxy schema-registry
Last synced: 27 Dec 2024
https://github.com/alibaba/sreworks
Cloud Native DataOps & AIOps Platform | δΊεηζ°ζΊθΏη»΄εΉ³ε°
aiops application cloudnative dataops devops engineering flink k8s kubernetes maintenance oam operation ops saas sre
Last synced: 13 Apr 2025
https://github.com/alibaba/SREWorks
Cloud Native DataOps & AIOps Platform | δΊεηζ°ζΊθΏη»΄εΉ³ε°
aiops application cloudnative dataops devops engineering flink k8s kubernetes maintenance oam operation ops saas sre
Last synced: 26 Mar 2025
https://github.com/meltano/meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
connectors data data-engineering data-pipelines dataops dataops-platform elt extract-data integration loaders meltano meltano-sdk open-source opensource pipelines singer tap taps target targets
Last synced: 08 Apr 2025
https://github.com/datavane/tis
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI
cdc chunjun dataops datax etl flink flink-streaming java
Last synced: 13 Apr 2025
https://github.com/raystack/optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
airflow analytics analytics-engineering automation bigquery business-intelligence data-modelling data-pipelines data-transformation data-warehouse dataops elt etl golang workflows
Last synced: 08 Apr 2025
https://github.com/tenzir/tenzir
Tenzir is the data pipeline engine for security teams.
dataops hacktoberfest incident-response investigation netflow pcap pipelines secdataops security siem sigma soc suricata threathunting zeek
Last synced: 13 Apr 2025
https://github.com/tenzir/vast
Tenzir is the data pipeline engine for security teams.
dataops hacktoberfest incident-response investigation netflow pcap pipelines secdataops security siem sigma soc suricata threathunting zeek
Last synced: 01 Mar 2025
https://github.com/azure-samples/modern-data-warehouse-dataops
DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo
automatedtesting azure cicd data databricks datafactory dataops devops fabric
Last synced: 13 Apr 2025
https://github.com/Azure-Samples/modern-data-warehouse-dataops
DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo
automatedtesting azure cicd data databricks datafactory dataops devops fabric
Last synced: 04 Dec 2024
https://github.com/polyaxon/datatile
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
dask data-exploration data-profiling data-quality data-quality-checks data-science data-visualization dataframes dataops explainable-ai matplotlib mlops pandas pandas-summary plotly pytorch spark statistics tensorflow tracking
Last synced: 30 Mar 2025
https://github.com/polyaxon/traceml
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
dask data-exploration data-profiling data-quality data-quality-checks data-science data-visualization dataframes dataops explainable-ai matplotlib mlops pandas pandas-summary plotly pytorch spark statistics tensorflow tracking
Last synced: 11 Apr 2025
https://github.com/vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL.
analytics data data-engineer data-engineering data-engineering-pipeline data-lineage data-pipelines data-science data-structures data-warehouse database dataops elt etl pipeline python snowflake sql trino warehouse
Last synced: 08 Apr 2025
https://github.com/flowerfine/scaleph
Open data platform based on Kubernetes. Scaleph supports SeaTunnelγFlink and Doris backended by SeaTunnel on Flink engineγFlink Kubernetes Operator and Doris operator.
dag data-platform dataops doris doris-manager doris-operator flink flink-kubernetes flink-kubernetes-operator flink-sql flink-sql-gateway seatunnel
Last synced: 04 Apr 2025
https://github.com/pbi-tools/pbi-tools
Power BI DevOps & Source Control Tool
dataops devops pbix power-bi source-control
Last synced: 13 Nov 2024
https://github.com/raystack/firehose
Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.
apache-kafka bigquery dataops firehose influxdb kafka postgresql prometheus sink streaming
Last synced: 05 Apr 2025
https://github.com/merantix-momentum/squirrel-core
A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:
ai cloud-computing collaboration computer-vision cv data-ingestion data-mesh data-science dataops datasets deep-learning distributed internal machine-learning ml natural-language-processing nlp python pytorch tensorflow
Last synced: 01 Apr 2025
https://github.com/raystack/frontier
Frontier is an all-in-one user management platform that provides identity, access and billing management to help organizations secure their systems and data. (Open source alternative to Clerk)
authentication authorization billing clerkauth dataops golang rbac spicedb user-management
Last synced: 08 Apr 2025
https://github.com/raystack/dagger
Dagger is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processing of real-time streaming data.
apache-flink apache-kafka dataops framework influxdb prometheus real-time-analytics real-time-processing stream-processing
Last synced: 06 Apr 2025
https://github.com/awslabs/aws-ddk
An open source development framework to help you build data workflows and modern data architecture on AWS.
aws dataengineering dataops python
Last synced: 24 Nov 2024
https://github.com/raystack/stencil
Stencil is a schema registry that provides schema management and validation dynamically, efficiently, and reliably to ensure data compatibility across applications.
cli clojure clojure-library dataops descriptor golang javascript javascript-library js protobuf protocol-buffers protocol-buffers-library protocol-buffers-parsing schema-registry schema-validation
Last synced: 05 Apr 2025
https://github.com/raystack/raccoon
Raccoon is a high-throughput, low-latency service to collect events in real-time from your web, mobile apps, and services using multiple network protocols.
clickstream dataops eventsourcing kafka
Last synced: 07 Apr 2025
https://github.com/raystack/meteor
Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog.
bigdata collector data-catalog data-management dataops extractors metadata scraper sinks
Last synced: 09 Apr 2025
https://github.com/lensesio/lenses-docker
β€for real-time DataOps - where the application and data fabric blends - Lenses
dataops docker enterprise governance kafka kubernetes openshift security
Last synced: 03 Apr 2025
https://github.com/raptor-ml/raptor
Transform your pythonic research to an artifact that engineers can deploy easily.
ai-infra data-engineering data-science dataops feature-engineering feature-extraction feature-platform featurestore kubeflow kubernetes machine-learning ml mlops model-deployment production raptor raptor-ml reactive-ml
Last synced: 31 Mar 2025
https://github.com/google/space
Unified storage framework for the entire machine learning lifecycle
apache-arrow apache-parquet data-warehouse dataops dataset dml lakehouse machine-learning mlops multimodal multimodal-data olap ray tensorflow tensorflow-dataset
Last synced: 17 Nov 2024
https://github.com/raystack/guardian
Guardian is universal data access management tool with automated access workflows and security controls across data stores, analytical systems, and cloud products.
access compliance control data dataops
Last synced: 24 Jan 2025
https://github.com/c-3lab/dim
π¦ dim: Manage the open data in your project like a package manager.
cli commads command-line-tool data dataops dim gpt gpt-3 llm opendata package-manager public-data public-dataset
Last synced: 09 Dec 2024
https://github.com/datakitchen/data-observability-installer
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.
data data-engineering data-observability data-profiling data-quality data-reliability data-science datachecker datacleaner datacleaning dataops dataquality datatesting datavalidation mssql pipeline-tests postgresql redshift self-hosted snowflake
Last synced: 04 Apr 2025
https://github.com/GitDataAI/jiaozifs
A Git-like Version Control File System for AI & Data Product Management.
aiops data-collaboration data-lake data-lineage data-product data-version-control data-versioning dataops digital-twins federated-learning git git-filesystem git-for-data git-interface jiaozifs jzfs mlops version-controlled-filesystem
Last synced: 03 Mar 2025
https://github.com/GitDataAI/jzfs
A Git-like Version Control File System for AI & Data Product Management.
aiops data-collaboration data-lake data-lineage data-product data-version-control data-versioning dataops digital-twins federated-learning git git-filesystem git-for-data git-interface jiaozifs jzfs mlops version-controlled-filesystem
Last synced: 04 Apr 2025
https://github.com/DataKitchen/data-observability-installer
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.
data data-engineering data-observability data-profiling data-quality data-reliability data-science datachecker datacleaner datacleaning dataops dataquality datatesting datavalidation mssql pipeline-tests postgresql redshift self-hosted snowflake
Last synced: 13 Nov 2024
https://github.com/beneath-hq/beneath
Beneath is a serverless real-time data platform β‘οΈ
analytics beneath data-engineering data-pipelines data-science data-warehouse dataops developer-tools etl go kubernetes mlops python sql streaming
Last synced: 03 Apr 2025
https://github.com/raystack/siren
Siren provides an easy-to-use universal alert, notification, channels management framework for the entire observability infrastructure.
alerting dataops influx monitoring prometheus
Last synced: 13 Apr 2025
https://github.com/noahgift/data-engineering-and-dataops
Duke MIDS: Data Engineering and DataOps Course
book cloud course data data-science dataengineering dataops duke mlops software-engineering
Last synced: 02 Mar 2025
https://github.com/google/grizzly
End-to-end DataOps platform deployed by Terraform.
airflow bigquery cloud-sql cloud-storage composer data-catalog data-lineage data-loss-prevention dataflow dataops dataops-platform gcp git google-cloud google-cloud-platform pubsub spanner terraform
Last synced: 11 Nov 2024
https://github.com/datakitchen/dataops-testgen
DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling, Β new dataset hygiene review, AI generation of data quality validation tests, ongoing testing of data refreshes, & continuous anomaly monitoring
data data-engineering data-observability data-quality data-science data-testing datachecker dataops dataprofiling dataquality datavalidation mssql postgresql python redshift self-hosted snowflake
Last synced: 06 Apr 2025
https://github.com/VulknData/vulkn
Love your Data. Love the Environment. Love VULKΠ.
bigdata clickhouse dataops pandas python vulkn vulkndata
Last synced: 11 Nov 2024
https://github.com/datakitchen/dataops-observability
DataOps Observability is part of DataKitchen's Open Source Data Observability. DataOps Observability monitors every data journey from data source to customer value, from any team development environment into production, across every tool, team, environment, and customer so that problems are detected, localized, and understood immediately.
data data-engineering data-observability data-science dataops pipleine-monitoring
Last synced: 09 Apr 2025
https://github.com/merantix-momentum/squirrel-datasets-core
Squirrel dataset hub
ai cloud-computing collaboration computer-vision cv data-ingestion data-mesh data-science dataops datasets deep-learning distributed machine-learning ml natural-language-processing npl python pytorch tensorflow
Last synced: 05 Apr 2025
https://github.com/raystack/charts
This repository is home to the original helm charts for products throughout the open data platform ecosystem.
charts dataops helm k8s kubernetes registry
Last synced: 20 Nov 2024
https://github.com/giacbrd/smartpipeline
A framework for rapid development of robust data pipelines following a simple design pattern
data-analysis data-analytics data-mining data-pipelines data-processing data-science dataops design-patterns etl machine-learning mlops pipeline pipeline-framework pipelines reproducibility task-queue workflow
Last synced: 21 Mar 2025
https://github.com/raystack/entropy
Entropy is a framework to safely and predictably create, change, and improve modern cloud applications and infrastructure using familiar languages, tools, and engineering practices.
Last synced: 20 Nov 2024
https://github.com/polyaxon/cli
Polyaxon Core Client & CLI to streamline MLOps
data-science dataops deep-learning hyperparameter-optimization kubernetes machine-learning ml mlops pytorch scikit-learn tensorflow workflows
Last synced: 15 Apr 2025
https://github.com/gni/bonjour-python
π Formation Python IntΓ©grale π: Plongez dans Python depuis les bases jusqu'aux projets avancΓ©s. ClartΓ©, pratique, et innovation au rendez-vous. Commencez votre aventure Python maintenant!
dataops debutant devops formation formation-python francais mlops python3
Last synced: 14 Apr 2025
https://github.com/duyet/charts
Collection of useful Helm Charts. Well test with KinD and Kubeconform
chart charts dataops devops hacktoberfest helm helm-chart helm-charts k8s kind kubernetes
Last synced: 14 Apr 2025
https://github.com/riveryio/rivery_cli
Rivery CLI
data-pipeline data-pipelines data-science database database-management dataops dataops-platform dwh dwh-team elt etl rivery
Last synced: 21 Nov 2024
https://github.com/ahmadalibagheri/dataops-roadmap
DataOps Roadmap for Learning
dataops dataops-platform dataops-principles
Last synced: 20 Mar 2025
https://github.com/marco-roy/DDO
A DBT package to perform DataOps & administrative CI/CD on your data warehouse.
data dataops datawarehouse datawarehouseautomation dbt snowflake
Last synced: 13 Nov 2024
https://github.com/raystack/handbook
Handbook is the central repository for how we build products within ODPF community.
cookies dataops documentation templates
Last synced: 20 Nov 2024
https://github.com/meltanolabs/singer-working-group
Working group for ongoing development and iteration of the Singer Spec, the de-facto protocol for open source data connectors. Please use "Issues" to create discussion items - or use "Discussions" for general questions.
data-integration dataops elt etl etl-pipeline singer
Last synced: 19 Feb 2025
https://github.com/brunocampos01/data-engineering
algorithms-techniques big-data big-o-notation bigdata cookbook data-engineering data-pipelines data-processing data-sctructures database-fundamentals dataops design-patterns design-systems java mysql paradigms python spark sql storage
Last synced: 15 Apr 2025
https://github.com/ricardolsmendes/aws-glue-ci-cd-blueprint
Companion repository for the "Streamlining AWS Glue CI/CD β A Comprehensive Blueprint" blog post
aws aws-glue ci-cd dataops devops iac-terraform infrastructure-as-code terraform
Last synced: 13 Mar 2025
https://github.com/gibbsbravo/datadelta
The best Python package for comparing two dataframes
analytics comparison data data-analytics database database-management databases dataops dataops-platform devops pandas pandas-dataframe testing testing-tools version-control
Last synced: 18 Dec 2024
https://github.com/aabouzaid/modern-data-platform-poc
My M.Sc. dissertation: Modern Data Platform using DataOps, Kubernetes, and Cloud-Native ecosystem to build a resilient Big Data platform based on Data Lakehouse architecture which is the base for Machine Learning (MLOps) and Artificial Intelligence (AIOps).
big-data cloud-agnostic cloud-native data-engineering data-lakehouse data-platform dataops edinburgh-napier kubernetes msc msc-project
Last synced: 19 Apr 2025
https://github.com/korniichuk/workflow
Workflow management platforms comparison
airflow apache-airflow aws aws-step-functions dataops luigi step-functions
Last synced: 15 Apr 2025
https://github.com/WeR-stats/workshop-setup_cloud_machine_data_science
Step-by-step instructions on how to set up a virtual machine for Data Science usiing Cloud Infrastructures
cloud data-science dataops digitalocean jupyterlab python r r-shiny r-stats rstudio rstudio-server shiny-server
Last synced: 04 Dec 2024
https://github.com/kharigardner/pyfivetran
Simple python interface for the Fivetran API. Powered by HTTPx.
api-wrapper data-engineering dataops etl fivetran httpx iaac ingestion integration python yaml-configuration
Last synced: 03 Dec 2024
https://github.com/zncdatadev/kubedoop
The modular open source big data platform using kubernetes and cloud-native ecosystem which is the base for DataOps/MLOps(LLMOps)
bigdata cloud-native data-platform dataops hadoop kubernetes llmops mlops
Last synced: 19 Nov 2024
https://github.com/abroniewski/tpc-di-ms-sql-benchmark
Using TPC-DI to benchmark MS SQL server using SQL script for extract, transform and load (ETL).
bdma benchmark data-engineering database-management dataops ms-sql-server mssql sql tpc-di tpc-ds tpc-ds-benchmark
Last synced: 12 Mar 2025
https://github.com/glueops/glueops-dev
This repository contains the GlueOps documentation website built using Docusaurus 2. It provides comprehensive guides and tutorials for deploying and managing applications using the GlueOps platform. The site includes setup instructions, configuration details, and best practices for GitOps workflows.
dataops devops documentation docusarus gitops javascript static-site static-site-generator
Last synced: 17 Dec 2024
https://github.com/ekgf/ekglib
A Python library for EKG DataOps operations
data-science dataops ekg knowledge-graph ldap semantic-tech xlsx
Last synced: 13 Nov 2024
https://github.com/raystack/.github
This repository contains the community health files for the @raystack organization
Last synced: 14 Mar 2025
https://github.com/ahmednasef3/software-task
This task explains OOP system types , GUI frameworks, Methods for connecting database with python application. and also explains Devops, MLops, Dataops tools.
database dataops devops-tools gui mlops oop software
Last synced: 20 Mar 2025
https://github.com/abroniewski/idlecompute-data-management-architecture
Implementation of a big data management and analysis backbone architecture using PySpark for distributed and scalable data ingestion and MLlib for machine learning analysis. Part of Big Data Management and Analytics (BDMA) program.
bdma big-data big-data-analytics bigdata dataops hadoop-hdfs machine-learning parquet pipeline pyspark-mllib
Last synced: 01 Mar 2025
https://github.com/mcleber/mlops_cardiotocography
Practical exercise developed in the course "Culture and Practices of DataOps and MLOps".
anaconda-environment cardiotocography dataops mlops python
Last synced: 02 Apr 2025
https://github.com/ronnyhdez/blog
A site for my blog posts
data-productos data-science dataops nvim python r
Last synced: 13 Apr 2025
https://github.com/pirocheto/data-engineering-projects
Website about my data engineering projects
aws cloud data dataops devops french machine-learning modeling python sql vizualisation
Last synced: 06 Apr 2025
https://github.com/abeltavares/versioned-data-lakehouse
π Git-like Version Control for Data with Nessie, Iceberg, and Spark
apache-iceberg apache-nessie apache-spark atomic-etl block-storage branch-based-development data-engineering data-lakehouse data-pipelines data-versioning dataops distributed-systems etl etl-pipeline git-for-data minio s3 spark-etl table-format time-travel
Last synced: 17 Mar 2025
https://github.com/gokcan61/console
A console in computing refers to a text-based interface where users can interact with a computer system through commands. It typically provides a direct way to execute tasks, access system resources, and troubleshoot issues without the need for a graphical user interface.
dashboard dataops debugger go halo-admin halo-console kafka kubesphere openshift-origin powershell rust symfony terminal web-ui
Last synced: 19 Feb 2025
https://github.com/pirocheto/data-engineering-knowledge
Website about data engineering knowledge
aws cloud data dataops devops french machine-learning modeling python sql vizualisation
Last synced: 04 Feb 2025