Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with data-pipelines

A curated list of projects in awesome lists tagged with data-pipelines .

https://github.com/apache/dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

airflow azkaban cloud-native data-pipelines job-scheduler orchestration powerful-data-pipelines task-scheduler workflow workflow-orchestration workflow-schedule

Last synced: 17 Dec 2024

https://github.com/elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

analytics-engineer bigquery data-analysis data-governance data-lineage data-observability data-pipeline data-pipelines data-reliability data-warehouse dataops dbt dbt-artifacts dbt-packages lineage redshift snowflake

Last synced: 17 Dec 2024

https://github.com/meltano/meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

connectors data data-engineering data-pipelines dataops dataops-platform elt extract-data integration loaders meltano meltano-sdk open-source opensource pipelines singer tap taps target targets

Last synced: 17 Dec 2024

https://github.com/combust/mleap

MLeap: Deploy ML Pipelines to Production

data-pipelines python scala scikit-learn spark tensorflow transformers

Last synced: 17 Dec 2024

https://github.com/data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data engineering community.

data data-engineer data-engineering data-modeling data-pipelines database etl sql

Last synced: 19 Dec 2024

https://github.com/dataform-co/dataform

Dataform is a framework for managing SQL based data operations in BigQuery

analytics business-intelligence data-engineering data-pipelines elt etl hacktoberfest

Last synced: 17 Dec 2024

https://github.com/fmind/mlops-python-package

Kickstart your MLOps initiative with a flexible, robust, and productive Python package.

automation data-pipelines data-science machine-learning mlflow mlops pandera pydantic python

Last synced: 20 Dec 2024

https://github.com/raystack/optimus

Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.

airflow analytics analytics-engineering automation bigquery business-intelligence data-modelling data-pipelines data-transformation data-warehouse dataops elt etl golang workflows

Last synced: 20 Dec 2024

https://github.com/artie-labs/transfer

Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.

apache-kafka bigquery cdc change-data-capture data-integration data-pipelines database debezium elt golang kafka redshift snowflake

Last synced: 20 Dec 2024

https://github.com/elementary-data/dbt-data-reliability

dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

analytics analytics-engineering data data-lineage data-observability data-pipeline-monitoring data-pipelines data-reliability dbt dbt-artifacts dbt-packages dbt-tests

Last synced: 21 Dec 2024

https://github.com/recap-build/recap

Work with your web service, database, and streaming schemas in a single format.

data-catalog data-discovery data-engineering data-integration data-pipelines etl metadata recap

Last synced: 13 Dec 2024

https://github.com/gabledata/recap

Work with your web service, database, and streaming schemas in a single format.

data-catalog data-discovery data-engineering data-integration data-pipelines etl metadata recap

Last synced: 11 Nov 2024

https://github.com/dataplane-app/dataplane

Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.

airflow data data-analysis data-engineering data-integration data-pipelines data-science dataplane datawarehouse etl finance golang kubernetes pipelines robotics-process-automation rpa scheduler workflow workflow-automation workflows

Last synced: 12 Nov 2024

https://github.com/kevin-hanselman/dud

A lightweight CLI tool for versioning data alongside source code and building data pipelines.

data-engineering data-pipelines data-science dataset dvcs machine-learning mlops

Last synced: 26 Oct 2024

https://github.com/koolreport/core

An Open Source PHP Reporting Framework that helps you to write perfect data reports or to construct awesome dashboards in PHP. Working great with all PHP versions from 5.6 to latest 8.0. Fully compatible with all kinds of MVC frameworks like Laravel, CodeIgniter, Symfony.

data-analysis data-pipelines data-pivot data-summarization data-visualization data-viz framework mysql-reporting-tools php php-reporting-tools php-reports report-generator reporting reporting-engine reporting-tool

Last synced: 20 Dec 2024

https://github.com/smart-data-lake/smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

data-lake data-pipelines deltalake hadoop hive scala smart-data-lake spark transform-data

Last synced: 20 Dec 2024

https://github.com/mycelial/mycelial

Move your data with ease.

data-pipelines edge-computing etl etl-pipeline rust

Last synced: 14 Nov 2024

https://github.com/bruin-data/bruin

Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.

analytics bigquery data-analysis data-modeling data-pipelines data-transformation python snowflake sql

Last synced: 18 Nov 2024

https://github.com/flipkart-incubator/spark-transformers

Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.

apache-spark data-pipelines export java machine-learning machine-learning-algorithms machine-learning-library mllib scala spark transformers

Last synced: 11 Oct 2024

https://github.com/mdh266/airflowdatapipeline

Example of an ETL Pipeline using Airflow

airflow data-engineering data-pipelines etl postgresql python

Last synced: 04 Dec 2024

https://github.com/iesahin/xvc

A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)

command-line-tool data data-engineering data-pipelines data-science devops machine-learning machine-learning-engineering mlops rust

Last synced: 11 Nov 2024

https://github.com/arakat-community/arakat

ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform

big-data-analytics business-intelligence cloud-native-applications data-pipelines distributed-systems docker docker-swarm predictive-maintenance

Last synced: 14 Nov 2024

https://github.com/kestra-io/examples

Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services

analytics-engineering automation data-engineering data-orchestration data-pipelines data-workflows orchestration

Last synced: 09 Nov 2024

https://github.com/larribas/dagger

Define sophisticated data pipelines with Python and run them on different distributed systems (such as Argo Workflows).

argo-workflows data-engineering data-pipelines data-science distributed-systems pipelines-as-code workflows

Last synced: 03 Dec 2024

https://github.com/anna-geller/kestra-ci-cd

CI/CD repository template to automate deployments of your production flows

automation data-engineering data-orchestration data-pipelines data-workflows orchestration

Last synced: 16 Dec 2024

https://github.com/unicef/magasin

Cloud native open-source end-to-end data / AI / ML platform

cloud dagster data data-pipelines data-science data-visualization helm-charts kubernetes magasin

Last synced: 09 Nov 2024

https://github.com/DataDrivenGit/Music-Streaming-App-using-AWS-ETL

Implemented Data Warehouse, Data Lake on AWS and Data modeling with Postgres and Apache Cassandra, Also used Apache Airflow to create data pipeline

airflow-operators cassandra data-lake data-pipelines datawarehouse postgres python3 sql

Last synced: 27 Nov 2024

https://github.com/zkan/introduction-to-data-pipelines-and-apache-airflow

Introduction to Data Pipelines and Apache Airflow

apache-airflow data-pipelines

Last synced: 19 Dec 2024

https://github.com/zkan/building-data-pipelines-with-apache-airflow

Building Data Pipelines with Apache Airflow

apache-airflow data-pipelines docker

Last synced: 19 Dec 2024

https://github.com/snehil-shah/seismic-alerts-streamer

A Realtime Seismic Logging & Alerts Service with Live Monitoring & Email Alerts made using Kafka Data Pipelines, all Dockerized & Deployment Ready!

containerized-build data-pipelines docker flask kafka websocket

Last synced: 12 Oct 2024

https://github.com/jmoussa/go-sentitweet

CLI Application holding a sentiment analysis data (Twitter tweets) pipeline with its own Web API to query results in the database. Written entirely in Go.

api channels cli cli-app cobra data-pipeline data-pipelines gin gin-framework gin-gonic go go-twitter golang gorilla-mux mongodb nlp sentiment-analysis twitter-api

Last synced: 10 Nov 2024

https://github.com/dataforgeopenaihub/mlops-credit-card-fraud-detection-end-to-end

End to End Machine Learning MLOps Project for Credit Card Fraud Detection using Ensemble Models, Data and Model Versioning through DVC, Github Actions, and Deployment

credit-risk data-pipelines dvc-pipeline github-actions google-drive-api machine-learning mlops-project mlops-workflow python

Last synced: 06 Dec 2024

https://github.com/vanderschaarlab/temporai-mivdp

TemporAI-MIVDP: Adaptation of MIMIC-IV-Data-Pipeline for TemporAI

data-pipelines mimic-iv

Last synced: 11 Nov 2024

https://github.com/the-swarm-corporation/custom-swarms-spec-template

Build your dream AI agent swarm with enterprise-grade reliability and scalability. This repository contains our official specification template for custom swarm development using the powerful Swarms Framework.

agents ai data-pipelines enterprise enterprise-grade fintech healthcare insurance ml multi-agent multi-agent-collaboration quant radiology security security-tools soc2 soc3 swarms swarms-agents swarms-of-agents

Last synced: 02 Dec 2024

https://github.com/siddharth-nandagopal/billionaires-rag-query

Billionaires RAG Query uses LLMs and a RAG framework to analyze the world's billionaires list. Extracts tabular data from PDFs, converts to multiple formats, and enables precise queries about net worth, age, and more. Integrates with Poetry and asdf for easy setup and management.

asdf billionaires-list camelot csv data-conversion data-ingestion data-pipelines financial-analysis json llm machine-learning natural-language-processing openai pdf-extraction poetry python rag structured-data tabular-data wealth-data

Last synced: 20 Dec 2024

https://github.com/mxagar/data_engineering_guide

Personal notes on the IBM Data Engineering Certificate as well as other sources focusing on AWS.

airflow aws data-lake data-modeling data-pipelines data-science no-sql spark sql warehouse

Last synced: 05 Nov 2024

https://github.com/joe-heffer-shef/airflow

Data engineering project template

data-engineering data-pipelines etl

Last synced: 24 Nov 2024

https://github.com/dina-hosny/sparkify---data-lake-with-aws

Sparkify - Data Lake with AWS - Udacity Data Engineering Expert Track.

analytics aws data-engineering data-lake data-pipelines dataset etl fwd udacity

Last synced: 14 Nov 2024

https://github.com/dina-hosny/sparkify---data-modeling-with-postgres

Sparkify - Data Modeling with Postgres - Udacity Data Engineering Expert Track.

data-engineering data-modeling data-pipelines database dataset fwd postgresql python sql udacity

Last synced: 14 Nov 2024

https://github.com/dina-hosny/data-engineering-capstone-project

Data Engineering Capstone Project - Udacity Data Engineering Expert Track.

analytics cassandra data-engineering data-pipelines data-science etl fwd spark udacity

Last synced: 14 Nov 2024

https://github.com/santiagortiiz/snowflake-data-pipelines

EPAM's Snowflake hands-on lab. We built a pipeline to read and load data from S3 into Snowflake, developed an ETL workflow to clean the data and stored it in a data warehouse with the 3NF and Star schemas for data mart analysis.

business-intelligence data-lake data-pipelines data-warehouse etl snowflake streams

Last synced: 10 Nov 2024

https://github.com/armahdavi/analytics-data-pipelines-statistics-ml-plotting---dust-extraction-hvac-filters---phase-2

PhD Technical Paper 1 - Phase 2 - Mahdavi & Siegel (2020) (Aerosol Science & Technology; AS&T) - Sharing all the data pipelines, processing codes, descriptive statistics, statistical modellings, and plotting/visualizations - Project Miestone: 2017 - 2020 - Full-length article is available

data-pipelines data-science data-visualization machine-learning matplotlib-pyplot numpy pandas-dataframe python scipy-stats sklearn statistics

Last synced: 12 Nov 2024

https://github.com/matz1979/airflow

My apache airflow project

airflow aws-s3 data-pipelines pipelines python s3-bucket

Last synced: 12 Nov 2024

https://github.com/dr4ks/airflow_cheatsheet

The Airflow CheatSheet repository is a comprehensive reference guide for Apache Airflow users, whether you're a beginner or an experienced practitioner. This repository aims to provide a quick and easy-to-use resource that covers the key concepts, commands, best practices, and tips related to Apache Airflow.

airflow-commands apache-airflow best-practices cheat-sheet dags data data-pipelines etl etl-pipelines python reference-guide task-dependencies task-operators task-scheduling workflow workflow-automation workflow-design workflow-management

Last synced: 06 Nov 2024

https://github.com/blacksujit/problems-i-have-faced-in-my-journey-of-programming

This repository contains the issues and errors which i have faced in my Prgramming and Machine Learning and Deep learning Journey

algorithms data-pipelines deep-learning errors etl-pipeline grade-applications machine-learning pipeline-processor problem-solving problems production-code production-errors

Last synced: 01 Dec 2024