Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/patterns-app/patterns-devkit
Data pipelines from re-usable components
data-analysis data-engineering data-pipeline data-pipelines data-science etl etl-framework etl-pipeline etl-pipelines functional-reactive-programming immutability pipelines sql
Last synced: 13 May 2024
![](https://github.com/patterns-app.png)
https://github.com/swirrl/table2qb
A generic pipeline for converting tabular data into rdf data cubes
clojure csv csvw datacube etl linked-data qb rdf
Last synced: 13 May 2024
![](https://github.com/Swirrl.png)
https://github.com/linkedpipes/etl
LinkedPipes ETL is an RDF based, lightweight ETL tool
etl linked-data linkedpipes rdf
Last synced: 13 May 2024
![](https://github.com/linkedpipes.png)
https://github.com/Swirrl/grafter
Linked Data & RDF Manufacturing Tools in Clojure
clojure data etl grafter linked-data rdf semantic-web
Last synced: 13 May 2024
![](https://github.com/Swirrl.png)
https://github.com/morph-kgc/morph-kgc
Powerful RDF Knowledge Graph Generation with RML Mappings
data-engineering data-integration database etl knowledge-graph python r2rml rdf rdf-star rml
Last synced: 12 May 2024
![](https://github.com/morph-kgc.png)
https://github.com/bts-cm/airdrop_tool
Fetch & analyse blockchain tickets. View leaderboards and user tickets. Calculate and perform provably fair airdrops.
airdrop bitshares bitsharesjs blockchain crypto data-analysis data-science electron etl javascript nodejs react ticket tusc
Last synced: 11 May 2024
![](https://github.com/BTS-CM.png)
https://github.com/long2ice/meilisync
Realtime sync data from MySQL/PostgreSQL/MongoDB to Meilisearch
datasync etl meilisearch mongodb mysql postgresql realtime-synchronization
Last synced: 10 May 2024
![](https://github.com/long2ice.png)
https://github.com/airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake
Last synced: 09 May 2024
![](https://github.com/airbytehq.png)
https://github.com/cloudquery/cloudquery
The open source high performance ELT framework powered by Apache Arrow
airbyte attack-surface-management aws azure bigquery cspm data data-analysis data-collection data-engineering data-integration elt etl etl-framework gcp github-api go google kubernetes sql
Last synced: 08 May 2024
![](https://github.com/cloudquery.png)
https://github.com/streamthoughts/kafka-connect-file-pulse
🔗 A multipurpose Kafka Connect connector that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka
amazon-s3 avro azure-storage csv etl file-streaming google-cloud grok-filters kafka kafka-connect kafka-connector kafka-producer xml
Last synced: 07 May 2024
![](https://github.com/streamthoughts.png)
https://github.com/catalyst-cooperative/pudl
The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
cems climate coal ddj eia eia860 eia923 electricity emissions energy epa etl ferc ghg natural-gas open-data pudl python sqlite utility
Last synced: 07 May 2024
![](https://github.com/catalyst-cooperative.png)
https://github.com/kalisio/krawler
A minimalist (geospatial) ETL
etl feathersjs gdal geojson geospatial geotiff mongodb nodejs ogc postgis s3 wcs wms
Last synced: 07 May 2024
![](https://github.com/kalisio.png)
https://github.com/insitro/redun
Yet another redundant workflow engine
aws bioinformatics data-engineering data-science docker etl gcp ml python workflow-engine
Last synced: 05 May 2024
![](https://github.com/insitro.png)
https://github.com/dotnetcore/SmartCode
SmartCode = IDataSource -> IBuildTask -> IOutput => Build Everything!!!
code-generator dotnet dotnet-core dotnetcore etl smartcode
Last synced: 05 May 2024
![](https://github.com/dotnetcore.png)
https://github.com/Cinchoo/ChoETL
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
avro cinchoo-etl csharp csv dotnet etl etl-framework flat json keyvalue parquet parquet-files parser reader writer xml yaml
Last synced: 05 May 2024
![](https://github.com/Cinchoo.png)
https://github.com/stn1slv/awesome-integration
A curated list of awesome system integration software and resources.
api api-design apim apimanagement awesome awesome-list bpm esb etl ipaas json markdown messaging mq mulesoft openapi rest-api testing workflow
Last synced: 05 May 2024
![](https://github.com/stn1slv.png)
https://github.com/jf-tech/omniparser
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
codeless csv delimited edi edifact etl fixed-length fixed-width golang javascript json parser schema schemas streaming transform txt x12 xml
Last synced: 04 May 2024
![](https://github.com/jf-tech.png)
https://github.com/mara/mara-example-project-2
An example mini data warehouse for python project stats, template for new projects
bigquery data-integration etl pypi sql
Last synced: 04 May 2024
![](https://github.com/mara.png)
https://github.com/smooks/smooks
Extensible data integration Java framework for building XML and non-XML fragment-based applications
analytics big-data enterprise-integration etl event-driven java pipelines sax smooks stream-processing xml
Last synced: 02 May 2024
![](https://github.com/smooks.png)
https://github.com/ropensci/elastic
R client for the Elasticsearch HTTP API
data-science database database-wrapper elasticsearch etl http json r r-package rstats
Last synced: 02 May 2024
![](https://github.com/ropensci.png)
https://github.com/MarcusBarnes/mik
The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
digital-preservation digital-repository etl islandora migration php php-cli repository-tools utility
Last synced: 01 May 2024
![](https://github.com/MarcusBarnes.png)
https://github.com/galliaproject/gallia-core
A schema-aware Scala library for data transformation
data-engineering data-manipulation data-science data-transformation etl feature-engineering json nesting scala spark
Last synced: 30 Apr 2024
![](https://github.com/galliaproject.png)
https://github.com/reugn/go-streams
A lightweight stream processing library for Go
aerospike data-pipeline data-stream etl kafka kafka-streams low-code nats-streaming pipeline pulsar redis stream-processing stream-processor streaming-api streaming-data streams throttling websocket windowing workflow
Last synced: 29 Apr 2024
![](https://github.com/reugn.png)
https://github.com/benthosdev/benthos
Fancy stream processing made operationally mundane
amqp cqrs data-engineering data-ops etl event-sourcing go golang kafka logs message-bus message-queue nats rabbitmq stream-processing stream-processor streaming-data
Last synced: 29 Apr 2024
![](https://github.com/benthosdev.png)
https://github.com/e-alizadeh/sample_dbt_project
Companion template repo for the blog post "dbt for Data Transformation - A Hands-on Tutorial" (https://ealizadeh.com/blog/dbt-tutorial)
data-engineering data-transformation database dbt dbt-packages dbtcloud etl sql
Last synced: 28 Apr 2024
![](https://github.com/e-alizadeh.png)
https://github.com/DAGWorks-Inc/hamilton
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
dag data-analysis data-engineering data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hacktoberfest lineage llmops machine-learning mlops numpy orchestration pandas python software-engineering
Last synced: 28 Apr 2024
![](https://github.com/DAGWorks-Inc.png)
https://github.com/aws/aws-sdk-pandas
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
amazon-athena amazon-sagemaker-notebook apache-arrow apache-parquet athena aws aws-glue aws-lambda data-engineering data-science emr etl glue-catalog lambda modin mysql pandas python ray redshift
Last synced: 28 Apr 2024
![](https://github.com/aws.png)
https://github.com/PhantomInsights/baby-names-analysis
Data ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.
eda etl matplotlib numpy pandas python-requests python3 seaborn
Last synced: 27 Apr 2024
![](https://github.com/PhantomInsights.png)
https://github.com/apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
airflow apache apache-airflow automation dag data-engineering data-integration data-orchestrator data-pipelines data-science elt etl machine-learning mlops orchestration python scheduler workflow workflow-engine workflow-orchestration
Last synced: 26 Apr 2024
![](https://github.com/apache.png)
https://github.com/blockchain-etl/ethereum-etl
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
aws bigquery blockchain-analytics csv erc20 erc20-tokens erc721 ethereum etl export gcp google-cloud sql transaction
Last synced: 25 Apr 2024
![](https://github.com/blockchain-etl.png)
https://github.com/ceumicrodata/mETL
mito ETL tool
data-integration etl etl-framework pipeline python
Last synced: 22 Apr 2024
![](https://github.com/ceumicrodata.png)
https://github.com/dswarm/dswarm-backoffice-web
The backoffice web application of d:swarm (https://github.com/dswarm/dswarm-documentation/wiki)
Last synced: 21 Apr 2024
![](https://github.com/dswarm.png)
https://github.com/grofit/persistity
A persistence framework for game developers
binary etl json serialization unity unity3d xml
Last synced: 21 Apr 2024
![](https://github.com/grofit.png)
https://github.com/stitchfix/hamilton
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
dag data-engineering data-platform data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hamilton hamiltonian machine-learning numpy pandas python software-engineering stitch-fix
Last synced: 20 Apr 2024
![](https://github.com/stitchfix.png)
https://github.com/mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
artificial-intelligence data data-engineering data-integration data-pipelines data-science dbt elt etl machine-learning orchestration pipeline pipelines python reverse-etl spark sql transformation
Last synced: 20 Apr 2024
![](https://github.com/mage-ai.png)
https://github.com/chronicle-app/chronicle-etl
📜 A CLI toolkit for extracting and working with your digital history
archiving chronicle chronicle-etl cli csv data-liberation etl json memex personal-archive personal-data quantified-self ruby
Last synced: 20 Apr 2024
![](https://github.com/chronicle-app.png)
https://github.com/lsc-project/lsc
LSC engine
etl identity-management ldap ldap-synchronization-connector
Last synced: 19 Apr 2024
![](https://github.com/lsc-project.png)
https://github.com/turbot/steampipe-plugin-kubernetes
Use SQL to instantly query Kubernetes API resources. Open source CLI. No DB required.
backup etl hacktoberfest k8s kubernetes kubernetes-api postgresql postgresql-fdw sql sqlite steampipe steampipe-plugin zero-etl
Last synced: 18 Apr 2024
![](https://github.com/turbot.png)
https://github.com/easysql/easy_sql
A library developed to ease the data ETL development process.
clickhouse etl postgres postgresql python spark sql
Last synced: 16 Apr 2024
![](https://github.com/easysql.png)
https://github.com/PublicI/fec-loader
Loads raw FEC filings into a database
campaignfinance elections etl fec node
Last synced: 16 Apr 2024
![](https://github.com/PublicI.png)
https://github.com/blockchain-etl/eos-etl
ETL scripts for EOS.
apache-beam blockchain-analytics crypto cryptocurrency data-analytics data-engineering eos eosio etl gcp google-bigquery google-cloud google-cloud-platform google-dataflow google-pubsub on-chain-analysis web3
Last synced: 15 Apr 2024
![](https://github.com/blockchain-etl.png)
https://github.com/neo4j/neo4j-jdbc
Official Neo4j JDBC Driver
business-intelligence driver etl integration java jdbc neo4j neo4j-driver sql2cypher
Last synced: 13 Apr 2024
![](https://github.com/neo4j.png)
https://github.com/cyber-drop/ethereum_analytical_db
Ethereum Analytical Database - Ethereum data access solution that can be used for analytics and application development. The solution works on a fast DB - Clickhouse.
api blockchain clickhouse dex erc20 erc223 erc721 eth ethereum ethereum-etl etl etl-pipeline
Last synced: 13 Apr 2024
![](https://github.com/cyber-drop.png)
https://github.com/data-solution-automation-engine/data-solution-framework
A library for data warehouse and data integration pattern and architecture documentation.
architecture data-warehouses datawarehouse design etl etl-control etl-processes patterns solution
Last synced: 10 Apr 2024
![](https://github.com/data-solution-automation-engine.png)
https://github.com/Datavault-UK/automate-dv
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
data-vault dataengineering datalake datavault datavault20 datawarehouse datawarehousing dbt elt etl metadata snowflake sql
Last synced: 10 Apr 2024
![](https://github.com/Datavault-UK.png)
https://github.com/feldera/feldera
Feldera Continuous Analytics Platform
analytics continous data-analysis data-pipeline database etl kafka materialized-view realtime rust sql streaming
Last synced: 09 Apr 2024
![](https://github.com/feldera.png)
https://github.com/umitkaanusta/reddit-detective
Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more
analysis analytics api data database elt etl graph graph-database neo4j network politics reddit social social-media social-network
Last synced: 08 Apr 2024
![](https://github.com/umitkaanusta.png)
https://github.com/toaco/carry
Python ETL(Extract-Transform-Load) tool / Data migration tool
database database-migrations datatransformer etl migration pandas python sqlalchemy
Last synced: 05 Apr 2024
![](https://github.com/toaco.png)
https://github.com/vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL.
analytics data data-engineer data-engineering data-engineering-pipeline data-lineage data-pipelines data-science data-structures data-warehouse database dataops elt etl pipeline python snowflake sql trino warehouse
Last synced: 05 Apr 2024
![](https://github.com/vmware.png)
https://github.com/superlinked/superlinked
A compute framework for turning complex data into vectors.
Last synced: 03 Apr 2024
![](https://github.com/superlinked.png)
https://github.com/MassStreetAnalytics/etl-framework
A framework for moving data into a data warehouse.
data-warehouse etl etl-components etl-framework etl-pipeline python sql sqlserver
Last synced: 01 Apr 2024
![](https://github.com/MassStreetAnalytics.png)
https://github.com/stonezhong/DataManager
Better organize data in data lake and build ETL pipeline with Web UI tool.
datalake datawarehouse etl spark sparksql
Last synced: 01 Apr 2024
![](https://github.com/stonezhong.png)
https://github.com/dazheng/SparkETL
Implement a complete data warehouse etl using spark SQL
datawarehouse etl spark sparksql
Last synced: 01 Apr 2024
![](https://github.com/dazheng.png)
https://github.com/zazuko/barnard59
An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.
data-integration data-pipeline data-processing etl json-ld linked-data pipeline rdf semantic-web
Last synced: 01 Apr 2024
![](https://github.com/zazuko.png)
https://github.com/datamade/data-making-guidelines
:blue_book: Making Data, the DataMade Way
datamade etl makefile principles
Last synced: 01 Apr 2024
![](https://github.com/datamade.png)
https://github.com/BitwiseInc/Hydrograph
A visual ETL development and debugging tool for big data
apache-spark big-data cascading etl etl-framework
Last synced: 01 Apr 2024
![](https://github.com/BitwiseInc.png)
https://github.com/level-vc/useful
The open-source Useful SDK. One python decorator in the Useful library allows for full observability of Python functions within an ETL.
etl etl-pipelines python-observability telemetry
Last synced: 01 Apr 2024
![](https://github.com/level-vc.png)
https://github.com/flock-lab/flock
Flock: A Low-Cost Streaming Query Engine on FaaS Platforms
continuous-queries etl lambda-functions olap serverless streaming
Last synced: 01 Apr 2024
![](https://github.com/flock-lab.png)
https://github.com/recap-build/recap
Work with your web service, database, and streaming schemas in a single format.
data-catalog data-discovery data-engineering data-integration data-pipelines etl metadata recap
Last synced: 01 Apr 2024
![](https://github.com/recap-build.png)
https://github.com/reachsumit/MSCA-31005-database-class-project
The aim of our team was to use data from meetup.com to build a recommender system which will identify and suggest groups and activities to a member based on their interest and additional interests of similar members. Furthermore, a social network analysis was done to identify the relationship between groups and people.
aws-ec2 aws-rds collaborative-filtering database-schema etl gephi jupyter-notebook knn-classification meetup meetup-api mysql-database pymysql python-3-6 social-network-analysis tableau
Last synced: 30 Mar 2024
![](https://github.com/reachsumit.png)
https://github.com/crealytics/spark-excel
A Spark plugin for reading and writing Excel files
data-frame etl excel scala spark
Last synced: 30 Mar 2024
![](https://github.com/crealytics.png)
https://github.com/php-etl/satellite
A micro-service compilation tool for data stream processing in the cloud
Last synced: 29 Mar 2024
![](https://github.com/php-etl.png)
https://github.com/nodefluent/kafka-connect
equivalent to kafka-connect :wrench: for nodejs :sparkles::turtle::rocket::sparkles:
connect datastore etl framework kafka kafka-connect nodejs
Last synced: 26 Mar 2024
![](https://github.com/nodefluent.png)
https://github.com/nodefluent/sequelize-kafka-connect
:gem: nodejs kafka connect connector for MySQL, Postgres, SQLite and MSSQL
etl kafka kafka-connect mssql mysql nodejs postgres sequelize sqlite
Last synced: 26 Mar 2024
![](https://github.com/nodefluent.png)
https://github.com/nodefluent/bigquery-kafka-connect
:cloud: nodejs kafka connect connector for Google BigQuery
big-data bigquery connect etl google-cloud kafka kafka-connect nodejs
Last synced: 26 Mar 2024
![](https://github.com/nodefluent.png)
https://github.com/datavane/tis
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI
cdc chunjun dataops datax etl flink flink-streaming java
Last synced: 26 Mar 2024
![](https://github.com/datavane.png)
https://github.com/astronomer/astro-sdk
Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows
Last synced: 24 Mar 2024
![](https://github.com/astronomer.png)
https://github.com/SETL-Framework/setl
A simple Spark-powered ETL framework that just works 🍺
big-data data-analysis data-engineering data-science data-transformation dataset etl etl-pipeline framework machine-learning modularization pipeline scala setl spark
Last synced: 23 Mar 2024
![](https://github.com/SETL-Framework.png)
https://github.com/basin-etl/basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
emr etl hadoop informatica odi pipeline pyspark spark
Last synced: 23 Mar 2024
![](https://github.com/basin-etl.png)
https://github.com/react-csv/react-csv
React components to build CSV files on the fly basing on Array/literal object of data
csv-document etl excel reactjs reporting
Last synced: 22 Mar 2024
![](https://github.com/react-csv.png)
https://github.com/SouthGreenPlatform/AgroLD_ETL
AgroLD is a RDF knowledge base that consists of data integrated from a variety of plant resources and ontologies. AgroLD ETL is the Python packages developed to transform plant datasets in RDF. Packages are developped for data standards such as GFF,GAF, VCF and specific plant databases.
etl gaf gff ontologies rdf rdf-data vcf
Last synced: 21 Mar 2024
![](https://github.com/SouthGreenPlatform.png)
https://github.com/nl2go/hetzner-invoice
Automatically download and transform Hetzner invoices.
etl etl-automation hetzner hetzner-cloud hetzner-invoice
Last synced: 21 Mar 2024
![](https://github.com/nl2go.png)
https://github.com/koopjs/koop
Transform, query, and download geospatial data on the web.
api arcgis arcgishub data-management etl feature-service geojson geojson-features geospatial geospatial-data gis hacktoberfest nodejs server spatial
Last synced: 20 Mar 2024
![](https://github.com/koopjs.png)
https://github.com/awslabs/aws-serverless-data-lake-framework
Enterprise-grade, production-hardened, serverless data lake on AWS
analytics aws best-practices data-engineering data-lake etl framework iac lake-formation serverless
Last synced: 19 Mar 2024
![](https://github.com/awslabs.png)
https://github.com/hooopo/kiba-plus
Kiba enhancement for Ruby ETL.
bulk etl kiba mysql postgresql
Last synced: 19 Mar 2024
![](https://github.com/hooopo.png)
https://github.com/nerevu/riko
A Python stream processing engine modeled after Yahoo! Pipes
asynchronous cli data etl featured functional-programming library parallelism rss stream-processing
Last synced: 19 Mar 2024
![](https://github.com/nerevu.png)
https://github.com/ascrus/getl
A tool for developing and testing ETL and ELT processes for automating the capture, delivery and processing of information in data warehouses on the MicroFocus Vertica platform.
csv dsl elt etl excel hdfs hive impala json kafka sql unit-testing vertica xml
Last synced: 19 Mar 2024
![](https://github.com/ascrus.png)
https://github.com/koaning/kadro
A friendly pandas wrapper with a more composable grammar support.
etl pandas-dataframe pydata python
Last synced: 18 Mar 2024
![](https://github.com/koaning.png)
https://github.com/blockchain-etl/blockchain-etl-architecture
Blockchain ETL Architecture
apache-beam blockchain blockchain-analytics crypto cryptocurrency data-analytics data-engineering ethereum etl gcp gke google-bigquery google-cloud google-cloud-platform google-container-engine google-dataflow google-pubsub kubernetes on-chain-analysis real-time-analytics
Last synced: 16 Mar 2024
![](https://github.com/blockchain-etl.png)
https://github.com/kokes/od
Česká otevřená data
civic-tech etl opendata postgresql
Last synced: 16 Mar 2024
![](https://github.com/kokes.png)