Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dagster-io/dagster-open-platform
Dagster Labs' open-source data platform, built with Dagster.
dagster data-engineering python
Last synced: 02 Jul 2024
![](https://github.com/dagster-io.png)
https://github.com/go-outside-labs/blockchain-infrastructure-design
👾 𝗶𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗽𝗿𝗼𝗷𝗲𝗰𝘁𝘀 𝗮𝗻𝗱 𝗠𝗩𝗣 𝘀𝗼𝘂𝗿𝗰𝗲 𝗰𝗼𝗱𝗲, 𝘀𝘂𝗰𝗵 𝗮𝘀 𝗮 𝘀𝗰𝗮𝗹𝗮𝗯𝗹𝗲 𝗲𝘃𝗲𝗻𝘁 𝘀𝗰𝗮𝗻𝗻𝗲𝗿𝘀, 𝗳𝗼𝗿 𝗼𝗻-𝗰𝗵𝗮𝗶𝗻 𝗮𝗻𝗮𝗹𝘆𝘀𝗶𝘀, 𝗵𝗳𝘁, 𝗺𝗹...
blockchain cypherpunk data-engineering ethereum event-scanner machine-learning quantitative-finance rust
Last synced: 02 Jul 2024
![](https://github.com/go-outside-labs.png)
https://github.com/aiplanethub/genai-stack
An End to End GenAI Framework
ai chatgpt data-engineering datascientist genai hacktoberfest hacktoberfest-accepted hacktoberfest2023 langchain llama llama-index llm llmops mlops
Last synced: 30 Jun 2024
![](https://github.com/aiplanethub.png)
https://github.com/kelvins/awesome-dataops
:sunglasses: A curated list of awesome DataOps tools
awesome awesome-list data-engineer data-engineering dataops
Last synced: 30 Jun 2024
![](https://github.com/kelvins.png)
https://github.com/mlrun/mlrun
MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
data-engineering data-science experiment-tracking kubernetes machine-learning mlops mlops-workflow model-serving python workflow
Last synced: 29 Jun 2024
![](https://github.com/mlrun.png)
https://github.com/ploomber/soorgeon
Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊
data-engineering data-science jupyter jupyter-notebooks machine-learning mlops workflow
Last synced: 29 Jun 2024
![](https://github.com/ploomber.png)
https://kevinheavey.github.io/modern-polars/
Code and data for the Modern Polars book
data-analytics data-engineering data-science dataengineering pandas polars python
Last synced: 29 Jun 2024
![](https://github.com/kevinheavey.png)
https://github.com/feathr-ai/feathr
Feathr – A scalable, unified data and AI engineering platform for enterprise
apache-spark artificial-intelligence azure data-engineering data-quality data-science feature-engineering feature-governance feature-management feature-marketplace feature-metadata feature-platform feature-store machine-learning mlops
Last synced: 29 Jun 2024
![](https://github.com/feathr-ai.png)
https://github.com/quintoandar/butterfree
A tool for building feature stores.
data-engineering data-science etl etl-framework feature-store package pyspark python
Last synced: 29 Jun 2024
![](https://github.com/quintoandar.png)
https://github.com/quiltdata/quilt
Quilt is a data mesh for connecting people with actionable data
data data-engineering data-version-control data-versioning parquet python serialization
Last synced: 29 Jun 2024
![](https://github.com/quiltdata.png)
https://github.com/kevin-hanselman/dud
A lightweight CLI tool for versioning data alongside source code and building data pipelines.
data-engineering data-pipelines data-science dataset dvcs machine-learning mlops
Last synced: 29 Jun 2024
![](https://github.com/kevin-hanselman.png)
https://github.com/aiguofer/gspread-pandas
A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
data data-analytics data-engineering data-science dataframes google google-sheets google-spreadsheets gspread pandas python sheets
Last synced: 26 Jun 2024
![](https://github.com/aiguofer.png)
https://github.com/Quantmetry/awesome_quantmetry
A list of repositories commonly used @ Quantmetry
data-engineering machine-learning pioneers statistics
Last synced: 26 Jun 2024
![](https://github.com/Quantmetry.png)
https://github.com/KennethanCeyer/awesome-data-pipeline
Awesome list for datapipeline
architecture awesome awesome-list big-data bigdata cloud data data-engineering dataeng datalake datapipeline datawarehouse hadoop hive opensource query spark
Last synced: 25 Jun 2024
![](https://github.com/KennethanCeyer.png)
https://argoproj.github.io/argo-workflows/
Workflow Engine for Kubernetes
airflow argo argo-workflows batch-processing cloud-native cncf dag data-engineering gitops hacktoberfest k8s knative kubernetes machine-learning mlops pipelines workflow workflow-engine
Last synced: 22 Jun 2024
![](https://github.com/argoproj.png)
https://github.com/CityOfBoston/analytics_docs
The official documentation of the City of Boston's Analytics Team.
boston city city-government civic-tech data-analytics data-engineering data-science documentation government markdown smart-cities
Last synced: 22 Jun 2024
![](https://github.com/CityOfBoston.png)
https://github.com/webysther/aws-glue-docker
🐋 Docker image for AWS Glue Spark/Python
apache-arrow aws aws-cli aws-glue aws-glue-docker cdk data-engineering development docker docker-image dockerfile etl glue-catalog glue-pyspark pandas pytest python python-poetry sam spark
Last synced: 21 Jun 2024
![](https://github.com/webysther.png)
https://github.com/dataplat/AzureDataPipelineTools
A collection of Azure Function to make building Azure Data Factory pipeline simpler and easier.
azure azure-data-factory azure-data-lake azure-functions data-engineering
Last synced: 21 Jun 2024
![](https://github.com/dataplat.png)
https://github.com/devsgnr/breadroll
breadroll 🥟 is a simple lightweight library for data processing operations written in Typescript and powered by Bun.
bun csv csv-parser data-engineering data-science data-transformation eda exploratory-data-analysis tsv tsv-parser
Last synced: 21 Jun 2024
![](https://github.com/devsgnr.png)
https://kantord.github.io/just-dashboard/
:bar_chart: :clipboard: Dashboards using YAML or JSON files
big-data business-intelligence chart csv d3 d3js dashboard data data-driven data-engineering data-science data-visualization gist github-gist json just-dashboard yaml
Last synced: 21 Jun 2024
![](https://github.com/kantord.png)
https://github.com/running-elephant/datart
Datart is a next generation Data Visualization Open Platform
analytics bi business-analytics business-intelligence chart d3 dashboard data-analysis data-analytics data-engineering data-visualization data-viz datart davinci display echarts react report sql-editor typescript
Last synced: 21 Jun 2024
![](https://github.com/running-elephant.png)
https://github.com/cnstlungu/portable-data-stack-dagster
A portable Datamart and Business Intelligence suite built with Docker, Dagster, dbt, DuckDB, PostgreSQL and Superset
business-intelligence dagster data-engineering data-visualization dbt duckdb python superset
Last synced: 17 Jun 2024
![](https://github.com/cnstlungu.png)
https://github.com/bacalhau-project/bacalhau
Compute over Data framework for public, transparent, and optionally verifiable computation
ai-art ai-data-collection ai-pipeline batch-processing bioinformatics-pipeline data-analysis data-engineering data-science decentralized decentralized-computing distributed gene-sequencing insulators iot logging-framework orchestration-framework p2p video-processing
Last synced: 17 Jun 2024
![](https://github.com/bacalhau-project.png)
https://github.com/quadratichq/quadratic
Quadratic | Data Science Spreadsheet with Python & SQL
data data-analysis data-engineering data-science etl python quadratic spreadsheet sql wasm webgl
Last synced: 17 Jun 2024
![](https://github.com/quadratichq.png)
https://github.com/datacoon/awesome-dataops
Awesome list of dataops products, open source and resources
cloud data data-engineering dataops etl workflow-engine
Last synced: 17 Jun 2024
![](https://github.com/datacoon.png)
https://github.com/dataplane-app/dataplane
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
airflow data data-analysis data-engineering data-integration data-pipelines data-science dataplane datawarehouse etl finance golang kubernetes pipelines robotics-process-automation rpa scheduler workflow workflow-automation workflows
Last synced: 17 Jun 2024
![](https://github.com/dataplane-app.png)
https://github.com/dlt-hub/dlt
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
data data-engineering data-lake data-loading data-warehouse elt extract load python transform
Last synced: 17 Jun 2024
![](https://github.com/dlt-hub.png)
https://github.com/opendatadiscovery/awesome-data-catalogs
📙 Awesome Data Catalogs and Observability Platforms.
awesome awesome-list big-data data-catalog data-discovery data-engineering data-quality datacatalog datadiscovery dataops metadata metadata-management ml observability open-source opendata opensource oss
Last synced: 16 Jun 2024
![](https://github.com/opendatadiscovery.png)
https://github.com/Hiflylabs/awesome-dbt
A curated list of awesome dbt resources
analytics-engineering awesome awesome-list data-engineering dbt
Last synced: 16 Jun 2024
![](https://github.com/Hiflylabs.png)
https://github.com/GoogleCloudPlatform/public-datasets-pipelines
Cloud-native, data onboarding architecture for Google Cloud Datasets
airflow bigquery cloud-composer cloud-native cloud-storage data-architecture data-engineering data-pipelines datasets google-cloud open-data
Last synced: 15 Jun 2024
![](https://github.com/GoogleCloudPlatform.png)
https://github.com/kantord/just-dashboard
:bar_chart: :clipboard: Dashboards using YAML or JSON files
big-data business-intelligence chart csv d3 d3js dashboard data data-driven data-engineering data-science data-visualization gist github-gist json just-dashboard yaml
Last synced: 14 Jun 2024
![](https://github.com/kantord.png)
https://github.com/yobulkdev/yobulkdev
🔥 🔥 🔥Open Source & AI driven Data Onboarding Platform:Free flatfile.com alternative
csv-import csv-parser csv-reader data-engineering datacleaning embeddable javascript languagemodel mongodb nextjs nodejs open-source react stream streaming
Last synced: 14 Jun 2024
![](https://github.com/yobulkdev.png)
https://github.com/kwai/blaze
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
arrow-datafusion big-data data-engineering execution-engine rust spark sql
Last synced: 13 Jun 2024
![](https://github.com/kwai.png)
https://github.com/prescode/open-fda-data-pipeline
A repeatable data pipeline to extract data from open.fda.gov, transform the data, and make the data available for advanced analytics in Amazon Web Services (AWS).
aws data-engineering elasticsearch
Last synced: 13 Jun 2024
![](https://github.com/prescode.png)
https://github.com/san089/goodreads_etl_pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
airflow airflow-dag apache-airflow apache-spark data-engineering data-engineering-pipeline data-lake data-migration emr-cluster etl-framework etl-job etl-pipeline goodreads-data-pipeline livy python redshift s3 scheduler spark warehouse
Last synced: 13 Jun 2024
![](https://github.com/san089.png)
https://github.com/alanchn31/Data-Engineering-Projects
Personal Data Engineering Projects
airflow aws-redshift cassandra data-engineering data-engineering-nanodegree data-lake data-modeling data-warehouse ingest-data mongodb postgres scrapy spark star-schema
Last synced: 13 Jun 2024
![](https://github.com/alanchn31.png)
https://github.com/andresionek91/airflow-autoscaling-ecs
Airflow Deployment on AWS ECS Fargate Using Cloudformation
airflow airflow-autoscaling-ecs airflow-deployment airflow-ecs data-engineering
Last synced: 12 Jun 2024
![](https://github.com/andresionek91.png)
https://github.com/nnthanh101/Serverless-DataHub
🎯 Building Scalable Cloud-Native DataHub Serverless Application ⛅
aws cdk customer-data-platform data-engineering datahub microservices serverless
Last synced: 12 Jun 2024
![](https://github.com/nnthanh101.png)
https://github.com/airbytehq/glossary
Data Glossary 🧠: An interactive digital garden for deeper data exploration. Learn through a graph and backlinks, enabling layered knowledge discovery.
Last synced: 12 Jun 2024
![](https://github.com/airbytehq.png)
https://github.com/san089/Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
airflow airflow-operators aws aws-ec2 aws-s3 aws-sdk cassandra cassandra-database cloudformation cluster data data-engineering data-engineering-pipeline data-lake data-modeling data-warehouse etl-pipeline infrastructure postgres postgresql-database
Last synced: 12 Jun 2024
![](https://github.com/san089.png)
https://github.com/adilkhash/Data-Engineering-HowTo
A list of useful resources to learn Data Engineering from scratch
cloud-providers data-engineering data-pipeline distributed-systems scala
Last synced: 12 Jun 2024
![](https://github.com/adilkhash.png)
https://github.com/oleg-agapov/data-engineering-book
Accumulated knowledge and experience in the field of Data Engineering
data data-engineering engineering
Last synced: 12 Jun 2024
![](https://github.com/oleg-agapov.png)
https://github.com/abhishek-ch/around-dataengineering
A Data Engineering & Machine Learning Knowledge Hub
airflow data-engineering datascience devops infrastructure machine-learning mlops spark
Last synced: 11 Jun 2024
![](https://github.com/abhishek-ch.png)
https://github.com/blockchain-etl/polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
airflow bigquery cryptocurrency data-engineering etl gcp matic-network maticnetwork polygon
Last synced: 11 Jun 2024
![](https://github.com/blockchain-etl.png)
https://github.com/blockchain-etl/bitcoin-etl
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
apache-beam bitcoin bitcoincash blockchain-analytics crypto cryptocurrency dash data-analytics data-engineering dogecoin etl gcp google-dataflow google-pubsub litecoin on-chain-analysis web3 zcash
Last synced: 11 Jun 2024
![](https://github.com/blockchain-etl.png)
https://github.com/hemansnation/God-Level-AI
A collection of scientific methods, processes, algorithms, and systems to build stories & models.
computer-vision data-engineering data-science data-structures-and-algorithms data-system-design data-visualization datastructures deep-learning machine-learning matplotlib mlops natural-language-processing numpy pandas python pytorch scikit-learn statistics tableau
Last synced: 11 Jun 2024
![](https://github.com/hemansnation.png)
https://github.com/flow-php/etl
PHP - ETL (Extract Transform Load) data processing library
data-engineering data-processing etl flow-php
Last synced: 11 Jun 2024
![](https://github.com/flow-php.png)
https://github.com/holistics/pgcp
Copying tables between Postgres databases (for analytics purpose)
Last synced: 10 Jun 2024
![](https://github.com/holistics.png)
https://github.com/moabukar/Everything-Tech
A collection of online resources to help you on your Tech journey.
ansible aws azure backend data-engineering data-science devops docker frontend gcp kubernetes machine-learning networking python serverless software-engineering tech terraform
Last synced: 09 Jun 2024
![](https://github.com/moabukar.png)
https://github.com/Avaiga/taipy
Turns Data and AI algorithms into production-ready web applications in no time.
automation data-engineering data-integration data-ops data-visualization datascience developer-tools hacktoberfest hacktoberfest2023 job-scheduler mlops orchestration pipeline pipelines python scenario scenario-analysis taipy-core taipy-gui workflow
Last synced: 08 Jun 2024
![](https://github.com/Avaiga.png)
https://github.com/NeumTry/NeumAI
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
ai chatgpt data data-engineering database embeddings etl llm llmops mlops ops pipeline python rag retrieval vector-database vectors
Last synced: 08 Jun 2024
![](https://github.com/NeumTry.png)
https://github.com/alibaba/feathub
FeatHub - A stream-batch unified feature store for real-time machine learning
apache-flink data data-engineering data-quality data-science feature-engineering feature-store machine-learning mlops streaming
Last synced: 07 Jun 2024
![](https://github.com/alibaba.png)
https://github.com/twalthr/flink-api-examples
Examples for using Apache Flink® with DataStream API, Table API, Flink SQL and connectors such as MySQL, JDBC, CDC, Kafka.
apache-flink data-engineering flink flink-examples flink-sql stream-processing
Last synced: 07 Jun 2024
![](https://github.com/twalthr.png)
https://github.com/beneath-hq/beneath
Beneath is a serverless real-time data platform ⚡️
analytics beneath data-engineering data-pipelines data-science data-warehouse dataops developer-tools etl go kubernetes mlops python sql streaming
Last synced: 07 Jun 2024
![](https://github.com/beneath-hq.png)
https://github.com/Leverege/gcp-data-engineer-exam
Study materials for the Google Cloud Professional Data Engineering Exam
certification-prep data-engineering gcp google-cloud-platform
Last synced: 06 Jun 2024
![](https://github.com/Leverege.png)
https://github.com/Multiwoven/multiwoven
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack. Leading Reverse ETL and Customer Data Platform (CDP) for Data Teams.
bigquery cdp customer-data-platform data-activation data-engineering data-pipeline data-warehouse databricks dbt etl hacktoberfest open-source postresql react redshift reverse-etl ruby self-hosted snowflake typescript
Last synced: 05 Jun 2024
![](https://github.com/Multiwoven.png)
https://github.com/mrpaulandrew/procfwk
A cross tenant metadata driven processing framework for Azure Data Factory and Azure Synapse Analytics achieved by coupling orchestration pipelines with a SQL database and a set of Azure Functions.
adf adfprocfwk azure azure-functions azure-sql-database data-engineering data-factory framework metadata pipelines processing procfwk
Last synced: 04 Jun 2024
![](https://github.com/mrpaulandrew.png)
https://github.com/moj-analytical-services/dbtools
Basic wrapper functions to query data using boto3 and Athena
Last synced: 04 Jun 2024
![](https://github.com/moj-analytical-services.png)
https://github.com/duyet/grant-rs
Manage Redshift/Postgres privileges in GitOps style written in Rust
data-engineering data-ops gitops hacktoberfest postgres redshift rust
Last synced: 03 Jun 2024
![](https://github.com/duyet.png)
https://github.com/moj-analytical-services/pydbtools
Python version of dbtools
data-engineering moj-data-engineering
Last synced: 03 Jun 2024
![](https://github.com/moj-analytical-services.png)
https://github.com/sodadata/soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
data-contracts data-engineering data-governance data-monitoring data-observability data-profiling data-quality data-quality-checks data-quality-monitoring data-quality-testing data-reliability data-testing data-unit-tests data-validation dataquality datatesting dbt pipeline-testing python snowflake
Last synced: 02 Jun 2024
![](https://github.com/sodadata.png)
https://github.com/ploomber/jupysql
Better SQL in Jupyter. 📊
bigquery clickhouse data-engineering data-science duckdb hive jupyter mysql polars postgres presto python redshift snowflake spark-sql sql sqlite trino tsql
Last synced: 02 Jun 2024
![](https://github.com/ploomber.png)
https://github.com/opendatadiscovery/opendatadiscovery-specification
ODD Specification is a universal open standard for collecting metadata.
api big-data big-data-platform data-discovery data-engineering data-governance data-mesh data-platform metadata metadata-management metadata-parser open-source opensource spec specification
Last synced: 02 Jun 2024
![](https://github.com/opendatadiscovery.png)
https://github.com/phidatahq/phidata
Build AI Assistants using function calling
ai aws data-engineering developer-tools docker gpt-4 llm llmops python
Last synced: 02 Jun 2024
![](https://github.com/phidatahq.png)
https://github.com/opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
alerting bigdata data-catalog data-discovery data-engineering data-exploration data-governance data-lineage data-observability data-pipelines data-platform data-profiling data-quality data-science datacatalog lineage metadata metadata-management observability oss
Last synced: 02 Jun 2024
![](https://github.com/opendatadiscovery.png)
https://github.com/kevintpeng/Learn-Something-Every-Day
📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
algorithm aws blog computer-science course-materials data-engineering data-science education educational engineering learning math mathematics research software-engineering university unix waterloo
Last synced: 01 Jun 2024
![](https://github.com/kevintpeng.png)
https://github.com/dataform-co/dataform
Dataform is a framework for managing SQL based data operations in BigQuery
analytics business-intelligence data-engineering data-pipelines elt etl hacktoberfest
Last synced: 01 Jun 2024
![](https://github.com/dataform-co.png)
https://github.com/daochenzha/data-centric-AI
A curated, but incomplete, list of data-centric AI resources.
ai artificial-intelligence data-centric data-centric-ai data-centric-machine-learning data-curation data-engineering data-quality data-science machine-learning
Last synced: 31 May 2024
![](https://github.com/daochenzha.png)
https://github.com/CloudWise-OpenSource/FlyFish
FlyFish is a data visualization coding platform. We can create a data model quickly in a simple way, and quickly generate a set of data visualization solutions by dragging.
analytics business-analytics charts data-analysis data-analysis-python data-engineering data-science data-visualization flyfish visualization
Last synced: 31 May 2024
![](https://github.com/CloudWise-OpenSource.png)
https://github.com/Moataz-Elmesmary/Data-Science-Roadmap
Data Science Roadmap from A to Z
big-data chatgpt cheatsheet cv-template data-analysis data-engineering data-science data-visualization deep-learning interview-questions linear-algebra llms machine-learning mathematics neural-network nlp probability python sql statistics
Last synced: 31 May 2024
![](https://github.com/Moataz-Elmesmary.png)
https://github.com/apache/incubator-devlake
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
dashboard-friendly data data-analysis data-engineering data-integration data-transfers devops domain-layer dora etl golang hacktoberfest integration jira open-source user-friendly
Last synced: 31 May 2024
![](https://github.com/apache.png)
https://github.com/meltano/meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
connectors data data-engineering data-pipelines dataops dataops-platform elt extract-data integration loaders meltano meltano-sdk open-source opensource pipelines singer tap taps target targets
Last synced: 31 May 2024
![](https://github.com/meltano.png)
https://github.com/whoiskatrin/sql-translator
SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence. This project is 100% free and open source.
data-analysis data-engineering dataquery datascience dataset openai postgresql query sql
Last synced: 31 May 2024
![](https://github.com/whoiskatrin.png)
https://github.com/redpanda-data/connect
Fancy stream processing made operationally mundane
amqp cqrs data-engineering data-ops etl event-sourcing go golang kafka logs message-bus message-queue nats rabbitmq stream-processing stream-processor streaming-data
Last synced: 30 May 2024
![](https://github.com/redpanda-data.png)
https://github.com/data-engineering-community/data-engineering-wiki
The best place to learn data engineering. Built and maintained by the data engineering community.
data data-engineer data-engineering data-modeling data-pipelines database etl sql
Last synced: 29 May 2024
![](https://github.com/data-engineering-community.png)
https://github.com/dbt-msft/dbt-sqlserver
dbt adapter for SQL Server and Azure SQL
analytics-engineering azure-sql azure-sql-db data-engineering dbt dbt-sqlserver microsoft microsoft-sql-server mssql sql sql-server t-sql transact-sql tsql
Last synced: 27 May 2024
![](https://github.com/dbt-msft.png)
https://github.com/moj-analytical-services/etl_manager
A python package to create a database on the platform using our moj data warehousing framework
Last synced: 27 May 2024
![](https://github.com/moj-analytical-services.png)
https://github.com/bitpicky/dbt-sugar
dbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models
data-engineering dbt dbt-sugar documentation
Last synced: 27 May 2024
![](https://github.com/bitpicky.png)
https://github.com/alanchn31/Movalytics-Data-Warehouse
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
airflow analytics aws-redshift aws-s3 data-engineer-nanodegree data-engineering data-engineering-pipeline data-modelling data-warehouse-cloud docker movie-database movie-recommendation movie-reviews pyspark python3 redshift spark sql udacity
Last synced: 27 May 2024
![](https://github.com/alanchn31.png)
https://github.com/AuFeld/Data_Engineering_Projects
A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousing, containerization, and a dashboard to monitor data pipeline KPIs
airflow aws cassandra data-engineering data-lake data-warehouse docker emr etl-pipeline infrastructure-as-code infrastructure-setup postgresql python redshift s3 spark
Last synced: 27 May 2024
![](https://github.com/AuFeld.png)
https://github.com/shipyardapp/postgresql-blueprints
Simplified blueprints for building data pipelines with PostgreSQL.
cli data-analysis data-engineering data-pipeline data-science database elt etl postgres postgresql
Last synced: 27 May 2024
![](https://github.com/shipyardapp.png)
https://github.com/shipyardapp/amazonathena-blueprints
Simplified blueprints for building data pipelines with Amazon Athena.
amazon-athena athena cli data-analysis data-engineering data-science elt etl
Last synced: 27 May 2024
![](https://github.com/shipyardapp.png)
https://github.com/kiwicom/contessa
Easy way to define, execute and store quality rules for your data.
data data-engineering data-quality framework mysql postgres python quality-assurance sqlite3
Last synced: 27 May 2024
![](https://github.com/kiwicom.png)
https://github.com/evidence-dev/evidence
Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown..
analytics business-intelligence dashboard data-engineering data-science data-visualization dbt duckdb exploratory-data-analysis finance open-source self-hosted sql statistics svelte tailwindcss webassembly
Last synced: 27 May 2024
![](https://github.com/evidence-dev.png)
https://github.com/Eventual-Inc/Daft
Distributed DataFrame for Python designed for the cloud, powered by Rust
big-data data-engineering data-science dataframe distributed-computing machine-learning python rust
Last synced: 22 May 2024
![](https://github.com/Eventual-Inc.png)
https://lge-arc-advancedai.github.io/auptimizer/
An automatic ML model optimization tool.
automated-machine-learning automl data-engineering data-science deep-learning hpo hyperparameter-optimization hyperparameter-tuning machine-learning neural-networks
Last synced: 20 May 2024
![](https://github.com/LGE-ARC-AdvancedAI.png)
https://github.com/mlcraft-io/mlcraft
Synmetrix – open source semantic layer / Boost your LLM precision
big-data bigquery business-intelligence clickhouse cube cubejs data-engineering databricks dremio druid firebolt llm prestodb redshift semantic-layer snowflake vertica
Last synced: 19 May 2024
![](https://github.com/mlcraft-io.png)
https://github.com/apache/superset
Apache Superset is a Data Visualization and Data Exploration Platform
analytics apache apache-superset asf bi business-analytics business-intelligence data-analysis data-analytics data-engineering data-science data-visualization data-viz flask python react sql-editor superset
Last synced: 18 May 2024
![](https://github.com/apache.png)
https://github.com/GokuMohandas/Made-With-ML
Learn how to design, develop, deploy and iterate on production-grade ML applications.
data-engineering data-quality data-science deep-learning distributed-ml distributed-training llms machine-learning mlops natural-language-processing python pytorch ray
Last synced: 18 May 2024
![](https://github.com/GokuMohandas.png)
https://github.com/LGE-ARC-AdvancedAI/auptimizer
An automatic ML model optimization tool.
automated-machine-learning automl data-engineering data-science deep-learning hpo hyperparameter-optimization hyperparameter-tuning machine-learning neural-networks
Last synced: 17 May 2024
![](https://github.com/LGE-ARC-AdvancedAI.png)
https://github.com/gunnarmorling/awesome-opensource-data-engineering
An Awesome List of Open-Source Data Engineering Projects
Last synced: 16 May 2024
![](https://github.com/gunnarmorling.png)
https://github.com/brexhq/substation
Substation is a cloud-native, event-driven data pipeline toolkit built for security teams.
aws data-engineering data-processing etl go security serverless
Last synced: 16 May 2024
![](https://github.com/brexhq.png)
https://github.com/dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
analytics dagster data-engineering data-integration data-orchestrator data-pipelines data-science etl metadata mlops orchestration python scheduler workflow workflow-automation
Last synced: 16 May 2024
![](https://github.com/dagster-io.png)
https://github.com/andkret/Cookbook
The Data Engineering Cookbook
best-practices big-data cookbook data-engineer data-engineering
Last synced: 16 May 2024
![](https://github.com/andkret.png)
https://github.com/eugeneyan/applied-ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
applied-data-science applied-machine-learning computer-vision data-discovery data-engineering data-quality data-science deep-learning machine-learning natural-language-processing production recsys reinforcement-learning search
Last synced: 15 May 2024
![](https://github.com/eugeneyan.png)
https://github.com/mikeroyal/Apache-Airflow-Guide
Apache Airflow Guide
airflow airflow-dags airflow-docker airflow-operators airflow-plugin awesome awesome-list awesome-resources big-data business-analytics business-intelligence data-engineering distributed python
Last synced: 14 May 2024
![](https://github.com/mikeroyal.png)
https://github.com/mikeroyal/Apache-Kafka-Guide
Apache Kafka Guide
awesome awesome-kafka awesome-list awesome-readme big-data bigdata data-engineering kafka kafka-connect kafka-consumer kafka-producer kafka-streams
Last synced: 14 May 2024
![](https://github.com/mikeroyal.png)
https://github.com/mikeroyal/Apache-Spark-Guide
Apache Spark Guide
apache-spark awesome awesome-automations awesome-list big-data data-engineering data-engineering-pipeline data-science machine-learning pyspark spark spark-streaming
Last synced: 14 May 2024
![](https://github.com/mikeroyal.png)
https://github.com/kestra-io/kestra
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
data data-engineering data-integration data-orchestration data-orchestrator data-pipeline data-quality elt etl low-code orchestration pipeline reverse-etl scheduler workflow workflow-engine
Last synced: 14 May 2024
![](https://github.com/kestra-io.png)
https://github.com/patterns-app/patterns-devkit
Data pipelines from re-usable components
data-analysis data-engineering data-pipeline data-pipelines data-science etl etl-framework etl-pipeline etl-pipelines functional-reactive-programming immutability pipelines sql
Last synced: 13 May 2024
![](https://github.com/patterns-app.png)