Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/dagster-io/dagster-open-platform

Dagster Labs' open-source data platform, built with Dagster.

dagster data-engineering python

Last synced: 02 Jul 2024

https://github.com/go-outside-labs/blockchain-infrastructure-design

👾 𝗶𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗽𝗿𝗼𝗷𝗲𝗰𝘁𝘀 𝗮𝗻𝗱 𝗠𝗩𝗣 𝘀𝗼𝘂𝗿𝗰𝗲 𝗰𝗼𝗱𝗲, 𝘀𝘂𝗰𝗵 𝗮𝘀 𝗮 𝘀𝗰𝗮𝗹𝗮𝗯𝗹𝗲 𝗲𝘃𝗲𝗻𝘁 𝘀𝗰𝗮𝗻𝗻𝗲𝗿𝘀, 𝗳𝗼𝗿 𝗼𝗻-𝗰𝗵𝗮𝗶𝗻 𝗮𝗻𝗮𝗹𝘆𝘀𝗶𝘀, 𝗵𝗳𝘁, 𝗺𝗹...

blockchain cypherpunk data-engineering ethereum event-scanner machine-learning quantitative-finance rust

Last synced: 02 Jul 2024

https://github.com/kelvins/awesome-dataops

:sunglasses: A curated list of awesome DataOps tools

awesome awesome-list data-engineer data-engineering dataops

Last synced: 30 Jun 2024

https://github.com/mlrun/mlrun

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

data-engineering data-science experiment-tracking kubernetes machine-learning mlops mlops-workflow model-serving python workflow

Last synced: 29 Jun 2024

https://github.com/ploomber/soorgeon

Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊

data-engineering data-science jupyter jupyter-notebooks machine-learning mlops workflow

Last synced: 29 Jun 2024

https://github.com/quiltdata/quilt

Quilt is a data mesh for connecting people with actionable data

data data-engineering data-version-control data-versioning parquet python serialization

Last synced: 29 Jun 2024

https://github.com/kevin-hanselman/dud

A lightweight CLI tool for versioning data alongside source code and building data pipelines.

data-engineering data-pipelines data-science dataset dvcs machine-learning mlops

Last synced: 29 Jun 2024

https://github.com/aiguofer/gspread-pandas

A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.

data data-analytics data-engineering data-science dataframes google google-sheets google-spreadsheets gspread pandas python sheets

Last synced: 26 Jun 2024

https://github.com/Quantmetry/awesome_quantmetry

A list of repositories commonly used @ Quantmetry

data-engineering machine-learning pioneers statistics

Last synced: 26 Jun 2024

https://github.com/dataplat/AzureDataPipelineTools

A collection of Azure Function to make building Azure Data Factory pipeline simpler and easier.

azure azure-data-factory azure-data-lake azure-functions data-engineering

Last synced: 21 Jun 2024

https://github.com/devsgnr/breadroll

breadroll 🥟 is a simple lightweight library for data processing operations written in Typescript and powered by Bun.

bun csv csv-parser data-engineering data-science data-transformation eda exploratory-data-analysis tsv tsv-parser

Last synced: 21 Jun 2024

https://github.com/cnstlungu/portable-data-stack-dagster

A portable Datamart and Business Intelligence suite built with Docker, Dagster, dbt, DuckDB, PostgreSQL and Superset

business-intelligence dagster data-engineering data-visualization dbt duckdb python superset

Last synced: 17 Jun 2024

https://github.com/quadratichq/quadratic

Quadratic | Data Science Spreadsheet with Python & SQL

data data-analysis data-engineering data-science etl python quadratic spreadsheet sql wasm webgl

Last synced: 17 Jun 2024

https://github.com/datacoon/awesome-dataops

Awesome list of dataops products, open source and resources

cloud data data-engineering dataops etl workflow-engine

Last synced: 17 Jun 2024

https://github.com/dataplane-app/dataplane

Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.

airflow data data-analysis data-engineering data-integration data-pipelines data-science dataplane datawarehouse etl finance golang kubernetes pipelines robotics-process-automation rpa scheduler workflow workflow-automation workflows

Last synced: 17 Jun 2024

https://github.com/dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

data data-engineering data-lake data-loading data-warehouse elt extract load python transform

Last synced: 17 Jun 2024

https://github.com/Hiflylabs/awesome-dbt

A curated list of awesome dbt resources

analytics-engineering awesome awesome-list data-engineering dbt

Last synced: 16 Jun 2024

https://github.com/yobulkdev/yobulkdev

🔥 🔥 🔥Open Source & AI driven Data Onboarding Platform:Free flatfile.com alternative

csv-import csv-parser csv-reader data-engineering datacleaning embeddable javascript languagemodel mongodb nextjs nodejs open-source react stream streaming

Last synced: 14 Jun 2024

https://github.com/kwai/blaze

Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.

arrow-datafusion big-data data-engineering execution-engine rust spark sql

Last synced: 13 Jun 2024

https://github.com/prescode/open-fda-data-pipeline

A repeatable data pipeline to extract data from open.fda.gov, transform the data, and make the data available for advanced analytics in Amazon Web Services (AWS).

aws data-engineering elasticsearch

Last synced: 13 Jun 2024

https://github.com/andresionek91/airflow-autoscaling-ecs

Airflow Deployment on AWS ECS Fargate Using Cloudformation

airflow airflow-autoscaling-ecs airflow-deployment airflow-ecs data-engineering

Last synced: 12 Jun 2024

https://github.com/nnthanh101/Serverless-DataHub

🎯 Building Scalable Cloud-Native DataHub Serverless Application ⛅

aws cdk customer-data-platform data-engineering datahub microservices serverless

Last synced: 12 Jun 2024

https://github.com/airbytehq/glossary

Data Glossary 🧠: An interactive digital garden for deeper data exploration. Learn through a graph and backlinks, enabling layered knowledge discovery.

data-engineering

Last synced: 12 Jun 2024

https://github.com/adilkhash/Data-Engineering-HowTo

A list of useful resources to learn Data Engineering from scratch

cloud-providers data-engineering data-pipeline distributed-systems scala

Last synced: 12 Jun 2024

https://github.com/oleg-agapov/data-engineering-book

Accumulated knowledge and experience in the field of Data Engineering

data data-engineering engineering

Last synced: 12 Jun 2024

https://github.com/blockchain-etl/polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

airflow bigquery cryptocurrency data-engineering etl gcp matic-network maticnetwork polygon

Last synced: 11 Jun 2024

https://github.com/blockchain-etl/bitcoin-etl

ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ

apache-beam bitcoin bitcoincash blockchain-analytics crypto cryptocurrency dash data-analytics data-engineering dogecoin etl gcp google-dataflow google-pubsub litecoin on-chain-analysis web3 zcash

Last synced: 11 Jun 2024

https://github.com/flow-php/etl

PHP - ETL (Extract Transform Load) data processing library

data-engineering data-processing etl flow-php

Last synced: 11 Jun 2024

https://github.com/holistics/pgcp

Copying tables between Postgres databases (for analytics purpose)

data-engineering tools

Last synced: 10 Jun 2024

https://github.com/NeumTry/NeumAI

Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.

ai chatgpt data data-engineering database embeddings etl llm llmops mlops ops pipeline python rag retrieval vector-database vectors

Last synced: 08 Jun 2024

https://github.com/alibaba/feathub

FeatHub - A stream-batch unified feature store for real-time machine learning

apache-flink data data-engineering data-quality data-science feature-engineering feature-store machine-learning mlops streaming

Last synced: 07 Jun 2024

https://github.com/twalthr/flink-api-examples

Examples for using Apache Flink® with DataStream API, Table API, Flink SQL and connectors such as MySQL, JDBC, CDC, Kafka.

apache-flink data-engineering flink flink-examples flink-sql stream-processing

Last synced: 07 Jun 2024

https://github.com/Leverege/gcp-data-engineer-exam

Study materials for the Google Cloud Professional Data Engineering Exam

certification-prep data-engineering gcp google-cloud-platform

Last synced: 06 Jun 2024

https://github.com/Multiwoven/multiwoven

🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack. Leading Reverse ETL and Customer Data Platform (CDP) for Data Teams.

bigquery cdp customer-data-platform data-activation data-engineering data-pipeline data-warehouse databricks dbt etl hacktoberfest open-source postresql react redshift reverse-etl ruby self-hosted snowflake typescript

Last synced: 05 Jun 2024

https://github.com/mrpaulandrew/procfwk

A cross tenant metadata driven processing framework for Azure Data Factory and Azure Synapse Analytics achieved by coupling orchestration pipelines with a SQL database and a set of Azure Functions.

adf adfprocfwk azure azure-functions azure-sql-database data-engineering data-factory framework metadata pipelines processing procfwk

Last synced: 04 Jun 2024

https://github.com/moj-analytical-services/dbtools

Basic wrapper functions to query data using boto3 and Athena

data-engineering

Last synced: 04 Jun 2024

https://github.com/duyet/grant-rs

Manage Redshift/Postgres privileges in GitOps style written in Rust

data-engineering data-ops gitops hacktoberfest postgres redshift rust

Last synced: 03 Jun 2024

https://github.com/phidatahq/phidata

Build AI Assistants using function calling

ai aws data-engineering developer-tools docker gpt-4 llm llmops python

Last synced: 02 Jun 2024

https://github.com/kevintpeng/Learn-Something-Every-Day

📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->

algorithm aws blog computer-science course-materials data-engineering data-science education educational engineering learning math mathematics research software-engineering university unix waterloo

Last synced: 01 Jun 2024

https://github.com/dataform-co/dataform

Dataform is a framework for managing SQL based data operations in BigQuery

analytics business-intelligence data-engineering data-pipelines elt etl hacktoberfest

Last synced: 01 Jun 2024

https://github.com/CloudWise-OpenSource/FlyFish

FlyFish is a data visualization coding platform. We can create a data model quickly in a simple way, and quickly generate a set of data visualization solutions by dragging.

analytics business-analytics charts data-analysis data-analysis-python data-engineering data-science data-visualization flyfish visualization

Last synced: 31 May 2024

https://github.com/apache/incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.

dashboard-friendly data data-analysis data-engineering data-integration data-transfers devops domain-layer dora etl golang hacktoberfest integration jira open-source user-friendly

Last synced: 31 May 2024

https://github.com/meltano/meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

connectors data data-engineering data-pipelines dataops dataops-platform elt extract-data integration loaders meltano meltano-sdk open-source opensource pipelines singer tap taps target targets

Last synced: 31 May 2024

https://github.com/whoiskatrin/sql-translator

SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence. This project is 100% free and open source.

data-analysis data-engineering dataquery datascience dataset openai postgresql query sql

Last synced: 31 May 2024

https://github.com/data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data engineering community.

data data-engineer data-engineering data-modeling data-pipelines database etl sql

Last synced: 29 May 2024

https://github.com/moj-analytical-services/etl_manager

A python package to create a database on the platform using our moj data warehousing framework

data-engineering etl python

Last synced: 27 May 2024

https://github.com/bitpicky/dbt-sugar

dbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models

data-engineering dbt dbt-sugar documentation

Last synced: 27 May 2024

https://github.com/AuFeld/Data_Engineering_Projects

A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousing, containerization, and a dashboard to monitor data pipeline KPIs

airflow aws cassandra data-engineering data-lake data-warehouse docker emr etl-pipeline infrastructure-as-code infrastructure-setup postgresql python redshift s3 spark

Last synced: 27 May 2024

https://github.com/shipyardapp/postgresql-blueprints

Simplified blueprints for building data pipelines with PostgreSQL.

cli data-analysis data-engineering data-pipeline data-science database elt etl postgres postgresql

Last synced: 27 May 2024

https://github.com/shipyardapp/amazonathena-blueprints

Simplified blueprints for building data pipelines with Amazon Athena.

amazon-athena athena cli data-analysis data-engineering data-science elt etl

Last synced: 27 May 2024

https://github.com/kiwicom/contessa

Easy way to define, execute and store quality rules for your data.

data data-engineering data-quality framework mysql postgres python quality-assurance sqlite3

Last synced: 27 May 2024

https://github.com/Eventual-Inc/Daft

Distributed DataFrame for Python designed for the cloud, powered by Rust

big-data data-engineering data-science dataframe distributed-computing machine-learning python rust

Last synced: 22 May 2024

https://github.com/gunnarmorling/awesome-opensource-data-engineering

An Awesome List of Open-Source Data Engineering Projects

awesome-list data-engineering

Last synced: 16 May 2024

https://github.com/brexhq/substation

Substation is a cloud-native, event-driven data pipeline toolkit built for security teams.

aws data-engineering data-processing etl go security serverless

Last synced: 16 May 2024

https://github.com/dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

analytics dagster data-engineering data-integration data-orchestrator data-pipelines data-science etl metadata mlops orchestration python scheduler workflow workflow-automation

Last synced: 16 May 2024

https://github.com/kestra-io/kestra

Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.

data data-engineering data-integration data-orchestration data-orchestrator data-pipeline data-quality elt etl low-code orchestration pipeline reverse-etl scheduler workflow workflow-engine

Last synced: 14 May 2024