Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/turbot/steampipe-plugin-github

Use SQL to instantly query repositories, users, gists and more from GitHub. Open source CLI. No DB required.

backup etl github github-cli github-client hacktoberfest postgresql postgresql-fdw sql sqlite steampipe steampipe-plugin zero-etl

Last synced: 03 Jul 2024

https://github.com/turbot/steampipe-plugin-azure

Use SQL to instantly query Azure resources across regions and subscriptions. Open source CLI. No DB required.

azure azure-cli azure-client azure-devops backup etl hacktoberfest postgresql postgresql-fdw sql sqlite steampipe steampipe-plugin zero-etl

Last synced: 03 Jul 2024

https://github.com/turbot/steampipe-plugin-shopify

Use SQL to instantly query Shopify products, orders and more. Open source CLI. No DB required.

backup etl hacktoberfest postgresql postgresql-fdw shopify shopify- shopify-orders shopify-partners shopify-products sql sqlite steampipe steampipe-plugin zero-etl

Last synced: 03 Jul 2024

https://github.com/turbot/steampipe-plugin-sdk

Steampipe Plugin SDK is a simple abstraction layer to write a Steampipe plugin. Plugins automatically work across all engine types including the Steampipe CLI, Postgres FDW, SQLite extension and the export CLI.

etl hacktoberfest postgresql postgresql-fdw sql sqlite sqlite-extension steampipe steampipe-plugin zero-etl

Last synced: 03 Jul 2024

https://github.com/turbot/steampipe-plugin-openai

Use SQL to instantly query OpenAI for completions, models & more. Open source CLI. No DB required.

backup etl golang gpt-3 hacktoberfest openai postgresql postgresql-fdw sql sqlite steampipe steampipe-plugin zero-etl

Last synced: 03 Jul 2024

https://github.com/turbot/steampipe-plugin-gcp

Use SQL to instantly query GCP resources across regions, projects and organizations. Open source CLI. No DB required.

backup etl gcloud gcloud-cli gcp hacktoberfest postgresql postgresql-fdw sql sqlite steampipe steampipe-plugin zero-etl

Last synced: 03 Jul 2024

https://github.com/turbot/steampipe-plugin-csv

Use SQL to instantly query data from CSV files. Open source CLI. No DB required.

backup csv etl hacktoberfest postgresql postgresql-fdw sql sqlite steampipe steampipe-plugin zero-etl

Last synced: 03 Jul 2024

https://github.com/turbot/steampipe-plugin-finance

Use SQL to instantly query financial data including quotes (equities, cryptocurrency, etc) and US public company information. Open source CLI. No DB required.

backup cryptocurrency edgar edgar-scraper etl finance hacktoberfest postgresql postgresql-fdw sql sqlite steampipe steampipe-plugin stock-market yahoo-finance yahoo-finance-api zero-etl

Last synced: 03 Jul 2024

https://github.com/flyanakin/CountMoney

A simple low-cost finance data pipeline orchestration. All you need is just python & SQL.

airtable-api dagster dbt etl finance modern-data-stack orchestration postgresql python sql stock tushare workflow

Last synced: 02 Jul 2024

https://github.com/rwynn/monstache

a go daemon that syncs MongoDB to Elasticsearch in realtime. you know, for search.

change-streams connector daemon elasticsearch etl go golang mongodb opensearch oplog realtime river sync synchronization tail

Last synced: 25 Jun 2024

https://github.com/EvilLord666/ReportGenerator

A small cross-database tool for building excel documents (reports) based on data from database that extacts via View or Stored Procedures with parametres, ordering e.t.c.

cross-database database database-reporting di-service etl etl-automation excel excel-export excel-to-sql generator reportgenerator reporting-engine reporting-tool reports smart-reporting sql-to-excel statement stored-procedures

Last synced: 21 Jun 2024

https://github.com/DataCater/datacater

The developer-friendly ETL platform for transforming data in real-time. Based on Apache Kafka® and Kubernetes®.

apache-kafka cloud-native data-pipelines etl kafka kubernetes python

Last synced: 21 Jun 2024

https://github.com/dswarm/dswarm

an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)

csv datamanagement datamapper dswarm etl json mapping metadata schema-mapping xml

Last synced: 20 Jun 2024

https://github.com/singer-io/getting-started

This repository is a getting started guide to Singer.

data-analysis etl etl-framework python singer

Last synced: 17 Jun 2024

https://github.com/datacleaner/DataCleaner

The premier open source Data Quality solution

data data-analysis data-science database datacleaner dataquality desktop etl mdm profiling

Last synced: 17 Jun 2024

https://github.com/quadratichq/quadratic

Quadratic | Data Science Spreadsheet with Python & SQL

data data-analysis data-engineering data-science etl python quadratic spreadsheet sql wasm webgl

Last synced: 17 Jun 2024

https://github.com/datacoon/awesome-dataops

Awesome list of dataops products, open source and resources

cloud data data-engineering dataops etl workflow-engine

Last synced: 17 Jun 2024

https://github.com/dataplane-app/dataplane

Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.

airflow data data-analysis data-engineering data-integration data-pipelines data-science dataplane datawarehouse etl finance golang kubernetes pipelines robotics-process-automation rpa scheduler workflow workflow-automation workflows

Last synced: 17 Jun 2024

https://github.com/appbaseio/abc

Power of appbase.io via CLI, with nifty imports from your favorite data sources

appbase cli elasticsearch etl

Last synced: 16 Jun 2024

https://github.com/camposvinicius/aws-etl

This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/AdventureWorks.zip, it's a zipped file with some .csvs inside that we will apply transformations.

airflow argocd athena aws catalog data data-engineer database emr emr-cluster etl glue kubernetes pipeline postgres pyspark rds spark

Last synced: 16 Jun 2024

https://github.com/deepeth/mars

The powerful analysis platform to explore and visualize data from blockchain.

bitcoin blockchain ethereum etl rust schema web3

Last synced: 16 Jun 2024

https://github.com/wgzhao/Addax

Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.

clickhouse data-integrity database datax etl excel hadoop hdfs hive impala influxdb kudu mysql oracle postgresql sqlserver trino

Last synced: 16 Jun 2024

https://github.com/elastic/eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

big-data data-analysis dataframe dataframes eland elasticsearch etl lightgbm machine-learning pandas python scikit-learn time-series-forecasting

Last synced: 16 Jun 2024

https://github.com/instill-ai/instill-core

🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications

ai api cli developer-tools etl generative-ai golang gpt hacktoberfest llm low-code no-code open-source pipeline python stable-diffusion typescript unstructured-data

Last synced: 14 Jun 2024

https://github.com/microsoft/etl2pcapng

Utility that converts an .etl file containing a Windows network packet capture into .pcapng format.

etl packet-capture wireshark

Last synced: 11 Jun 2024

https://github.com/blockchain-etl/polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

airflow bigquery cryptocurrency data-engineering etl gcp matic-network maticnetwork polygon

Last synced: 11 Jun 2024

https://github.com/blockchain-etl/bitcoin-etl

ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ

apache-beam bitcoin bitcoincash blockchain-analytics crypto cryptocurrency dash data-analytics data-engineering dogecoin etl gcp google-dataflow google-pubsub litecoin on-chain-analysis web3 zcash

Last synced: 11 Jun 2024

https://github.com/flow-php/etl

PHP - ETL (Extract Transform Load) data processing library

data-engineering data-processing etl flow-php

Last synced: 11 Jun 2024

https://github.com/flow-php/flow

Flow PHP - strongly typed data processing framework

etl etl-framework etl-pipeline

Last synced: 11 Jun 2024

https://github.com/turbot/steampipe-sqlite

Steampipe SQLite is a zero-ETL engine for SQLite. Virtual tables translate queries into live API calls for cloud services and APIs. Hundreds of plugins with thousands of documented examples.

aws azure data devsecops etl gcp golang kubernetes security sql sqlite steampipe steampipe-engine zero-etl

Last synced: 10 Jun 2024

https://github.com/m-lab/etl-schema

All schema and views related to the etl pipeline and public bigquery tables.

etl pipeline

Last synced: 10 Jun 2024

https://github.com/tmusabbir/glue-utils

Few AWS Glue Utility Scripts

amazon-web-services aws emr etl glue lakeformation

Last synced: 10 Jun 2024

https://github.com/vh-d/Rflow

Rflow is a general-purpose workflow management framework for R

data-processing database dataflow etl etl-framework r reproducibility rlang rstats rstats-package workflow-management

Last synced: 10 Jun 2024

https://github.com/vh-d/RETL

R package for ETL

etl etl-framework transformations

Last synced: 10 Jun 2024

https://github.com/hofstadter-io/cuetils

CLI and library for diff, patch, and ETL operations on CUE, JSON, and Yaml

configuration cue cuelang diff etl golang jq json structural-diff yaml

Last synced: 09 Jun 2024

https://github.com/jupyter-naas/naas

Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)

ai binder data data-science data-transformation engine etl integration jupyter jupyterlab notebooks open-source pipeline

Last synced: 08 Jun 2024

https://github.com/NeumTry/NeumAI

Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.

ai chatgpt data data-engineering database embeddings etl llm llmops mlops ops pipeline python rag retrieval vector-database vectors

Last synced: 08 Jun 2024

https://github.com/halestudio/hale

(Spatial) data harmonisation with hale studio (formerly HUMBOLDT Alignment Editor)

data-harmonisation database eclipse-rcp etl etl-framework geospatial-data gml groovy hale hale-studio humboldt-alignment-editor inspire java scala transformation xml

Last synced: 08 Jun 2024

https://github.com/ananas-analytics/ananas-desktop

A hackable data integration & analysis tool to enable non technical users to edit data processing jobs and visualise data on demand.

analytics business-intelligence data-modeling etl hackable-data visualization

Last synced: 07 Jun 2024

https://github.com/twineworks/ruby-for-pentaho-kettle

Ruby scripting for pentaho-kettle

etl java jurby kettle pdi pentaho-kettle ruby

Last synced: 07 Jun 2024

https://github.com/zhaoyachao/zdh_web

大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批流,私域营销等模块

bigdata collection data data-collection datapipeline datax-web etl pipline scheduler spark sparketl

Last synced: 07 Jun 2024

https://github.com/ICIJ/extract

A cross-platform command line tool for parallelised content extraction and analysis.

ediscovery etl index solr tika

Last synced: 07 Jun 2024

https://github.com/PeerDB-io/peerdb

Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage

bigquery cdc clickhouse cloud-native distributed-systems etl eventhubs kafka postgres postgresql realtime rust s3 snowflake sql stream-processing

Last synced: 07 Jun 2024

https://github.com/turbot/steampipe-plugin-code

Use SQL to instantly query secrets and more from source code. Open source CLI. No DB required.

backup code-scanner etl hacktoberfest postgresql postgresql-fdw secrets-detection sql sqlite steampipe steampipe-plugin zero-etl

Last synced: 06 Jun 2024

https://github.com/grailbio/bigslice

A serverless cluster computing system for the Go programming language

bigdata cluster computing etl go golang machinelearning mapreduce

Last synced: 05 Jun 2024

https://github.com/Multiwoven/multiwoven

🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack. Leading Reverse ETL and Customer Data Platform (CDP) for Data Teams.

bigquery cdp customer-data-platform data-activation data-engineering data-pipeline data-warehouse databricks dbt etl hacktoberfest open-source postresql react redshift reverse-etl ruby self-hosted snowflake typescript

Last synced: 05 Jun 2024

https://github.com/sdcastillo/ExamPAData

A container for data sets to help actuaries who are practicing predictive analytics

content-marketing cran education etl etl-pipeline

Last synced: 04 Jun 2024

https://github.com/BetweenTwoTests/between_dbs

DDL & test data for different databases for ETL data quality checks / data loading tests

data-quality database etl

Last synced: 04 Jun 2024

https://github.com/hackersandslackers/bigquery-sqlalchemy-tutorial

:bar_chart: :arrow_right: :floppy_disk: ETL script to migrate data from BigQuery to SQL.

bigquery bigquery-sqlalchemy-tutorial databases etl mysql postgres python sql sqlalchemy tutorial

Last synced: 03 Jun 2024

https://github.com/seanharr11/etlalchemy

Extract, Transform, Load: Any SQL Database in 4 lines of Code.

database etl etl-framework migrations python sqlalchemy

Last synced: 02 Jun 2024

https://github.com/compose/transporter

Sync data between persistence engines, like ETL only not stodgy

elasticsearch etl go mongodb mysql postgresql rabbitmq rethinkdb

Last synced: 02 Jun 2024

https://github.com/opencultureconsulting/openrefine-batch

Shell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that communicates with the OpenRefine API.

bash-script batch-processing code4lib docker etl openrefine

Last synced: 01 Jun 2024

https://github.com/raystack/optimus

Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.

airflow analytics analytics-engineering automation bigquery business-intelligence data-modelling data-pipelines data-transformation data-warehouse dataops elt etl golang workflows

Last synced: 01 Jun 2024

https://github.com/2ndQuadrant/pglogical

Logical Replication extension for PostgreSQL 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.

cdc data-transformation data-transport database-replication etl logical-decoding postgresql publish-subscribe replication subscription zero-downtime

Last synced: 01 Jun 2024

https://github.com/dataform-co/dataform

Dataform is a framework for managing SQL based data operations in BigQuery

analytics business-intelligence data-engineering data-pipelines elt etl hacktoberfest

Last synced: 01 Jun 2024

https://github.com/apache/doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

bigquery database dbt delta-lake elt etl hadoop hive hudi iceberg lakehouse olap query-engine real-time redshift snowflake spark sql

Last synced: 31 May 2024

https://github.com/apache/incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.

dashboard-friendly data data-analysis data-engineering data-integration data-transfers devops domain-layer dora etl golang hacktoberfest integration jira open-source user-friendly

Last synced: 31 May 2024

https://github.com/jbogard/bulk-writer

Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.

bulk-writer etl etl-job pipeline pipeline-stage sql sqlbulkcopy stream-data

Last synced: 31 May 2024

https://github.com/iftech-engineering/mongo-es

A MongoDB to Elasticsearch connector

connector elasticsearch etl mongodb

Last synced: 30 May 2024

https://github.com/xyflow/awesome-node-based-uis

A curated list with resources about node-based UIs

awesome-list etl node-based-ui visual-programming workflow-editor

Last synced: 30 May 2024

https://github.com/Claviz/bellboy

Highly performant JavaScript data stream ETL engine.

etl excel mssql nodejs postgres streaming

Last synced: 30 May 2024

https://github.com/data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data engineering community.

data data-engineer data-engineering data-modeling data-pipelines database etl sql

Last synced: 29 May 2024

https://github.com/nucleuscloud/neosync

Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.

benthos docker etl faker fine-tuning golang kubernetes nextjs open-source orchestration protobuf react reactjs self-hosted synthetic-data synthetic-data-generation test-data-generator testing typescript

Last synced: 28 May 2024

https://github.com/zsvoboda/dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

bigquery csv database database-schemas elt etl excel json mysql parquet postgresql python python3 redshift snowflake sql sqlite xls xlsx

Last synced: 27 May 2024

https://github.com/miztiik/s3-to-rds-with-glue

Extract, transform, and load data for analytic processing using AWS Glue

cdk cloud-development-kit etl glue glue-catalog glue-job miztiik-automation s3-to-rds spark

Last synced: 27 May 2024

https://github.com/moj-analytical-services/etl_manager

A python package to create a database on the platform using our moj data warehousing framework

data-engineering etl python

Last synced: 27 May 2024

https://github.com/datamill-co/target-postgres

A Singer.io Target for Postgres

etl json-schema postgres singer stream

Last synced: 27 May 2024

https://github.com/shipyardapp/postgresql-blueprints

Simplified blueprints for building data pipelines with PostgreSQL.

cli data-analysis data-engineering data-pipeline data-science database elt etl postgres postgresql

Last synced: 27 May 2024

https://github.com/shipyardapp/amazonathena-blueprints

Simplified blueprints for building data pipelines with Amazon Athena.

amazon-athena athena cli data-analysis data-engineering data-science elt etl

Last synced: 27 May 2024

https://github.com/turbot/steampipe-plugin-aws

Use SQL to instantly query AWS resources across regions and accounts. Open source CLI. No DB required.

aws aws-cli backup etl hacktoberfest postgresql postgresql-fdw sql sqlite steampipe steampipe-plugin zero-etl

Last synced: 27 May 2024

https://github.com/thbar/kiba

Data processing & ETL framework for Ruby

data etl etl-ruby ruby rubydatascience

Last synced: 26 May 2024

https://github.com/wx-chevalier/sentinel-crawler

Xenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with Prometheus) or ETL for Infrastructure :dizzy: 多语言执行器,分布式爬虫

crawler etl koa2 monitor nodejs react wx-code

Last synced: 26 May 2024

https://github.com/opencultureconsulting/openrefine-client

The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.

binder code4lib docker etl openrefine pypi python

Last synced: 26 May 2024

https://github.com/onepanelio/onepanel

The open source, end-to-end computer vision platform. Label, build, train, tune, deploy and automate in a unified platform that runs on any cloud and on-premises.

ai aiops annotation computer-vision deeplearning etl hyperparameter-tuning inference jupyterlab labeling machinelearning mlops pipelines pytorch tensorboard tensorflow training workflows

Last synced: 19 May 2024

https://github.com/mara/mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

data data-integration etl pipeline postgresql python

Last synced: 18 May 2024

https://github.com/TobikoData/sqlmesh

Efficient data transformation and modeling framework that is backwards compatible with dbt.

dataengineering dataops dbt elt etl python sql transformation

Last synced: 16 May 2024

https://github.com/brexhq/substation

Substation is a cloud-native, event-driven data pipeline toolkit built for security teams.

aws data-engineering data-processing etl go security serverless

Last synced: 16 May 2024

https://github.com/dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

analytics dagster data-engineering data-integration data-orchestrator data-pipelines data-science etl metadata mlops orchestration python scheduler workflow workflow-automation

Last synced: 16 May 2024

https://github.com/kestra-io/kestra

Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.

data data-engineering data-integration data-orchestration data-orchestrator data-pipeline data-quality elt etl low-code orchestration pipeline reverse-etl scheduler workflow workflow-engine

Last synced: 14 May 2024

https://github.com/adilkhash/luigi-telegram

Luigi Tasks status notifications to Telegram

data-pipeline data-processing etl luigi notification-plugin

Last synced: 13 May 2024

https://github.com/swirrl/table2qb

A generic pipeline for converting tabular data into rdf data cubes

clojure csv csvw datacube etl linked-data qb rdf

Last synced: 13 May 2024

https://github.com/linkedpipes/etl

LinkedPipes ETL is an RDF based, lightweight ETL tool

etl linked-data linkedpipes rdf

Last synced: 13 May 2024