Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/swirrl/table2qb

A generic pipeline for converting tabular data into rdf data cubes

clojure csv csvw datacube etl linked-data qb rdf

Last synced: 13 May 2024

https://github.com/linkedpipes/etl

LinkedPipes ETL is an RDF based, lightweight ETL tool

etl linked-data linkedpipes rdf

Last synced: 13 May 2024

https://github.com/turbot/steampipe

Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.

aws azure cis cloud cnapp cspm devops devsecops etl gcp golang kubernetes postgresql postgresql-fdw security sql sqlite steampipe terraform zero-etl

Last synced: 13 May 2024

https://github.com/Swirrl/grafter

Linked Data & RDF Manufacturing Tools in Clojure

clojure data etl grafter linked-data rdf semantic-web

Last synced: 13 May 2024

https://github.com/morph-kgc/morph-kgc

Powerful RDF Knowledge Graph Generation with RML Mappings

data-engineering data-integration database etl knowledge-graph python r2rml rdf rdf-star rml

Last synced: 12 May 2024

https://github.com/bts-cm/airdrop_tool

Fetch & analyse blockchain tickets. View leaderboards and user tickets. Calculate and perform provably fair airdrops.

airdrop bitshares bitsharesjs blockchain crypto data-analysis data-science electron etl javascript nodejs react ticket tusc

Last synced: 11 May 2024

https://github.com/long2ice/meilisync

Realtime sync data from MySQL/PostgreSQL/MongoDB to Meilisearch

datasync etl meilisearch mongodb mysql postgresql realtime-synchronization

Last synced: 10 May 2024

https://github.com/airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake

Last synced: 09 May 2024

https://github.com/streamthoughts/kafka-connect-file-pulse

🔗 A multipurpose Kafka Connect connector that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka

amazon-s3 avro azure-storage csv etl file-streaming google-cloud grok-filters kafka kafka-connect kafka-connector kafka-producer xml

Last synced: 07 May 2024

https://github.com/catalyst-cooperative/pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.

cems climate coal ddj eia eia860 eia923 electricity emissions energy epa etl ferc ghg natural-gas open-data pudl python sqlite utility

Last synced: 07 May 2024

https://github.com/insitro/redun

Yet another redundant workflow engine

aws bioinformatics data-engineering data-science docker etl gcp ml python workflow-engine

Last synced: 05 May 2024

https://github.com/dotnetcore/SmartCode

SmartCode = IDataSource -> IBuildTask -> IOutput => Build Everything!!!

code-generator dotnet dotnet-core dotnetcore etl smartcode

Last synced: 05 May 2024

https://github.com/Cinchoo/ChoETL

ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

avro cinchoo-etl csharp csv dotnet etl etl-framework flat json keyvalue parquet parquet-files parser reader writer xml yaml

Last synced: 05 May 2024

https://github.com/stn1slv/awesome-integration

A curated list of awesome system integration software and resources.

api api-design apim apimanagement awesome awesome-list bpm esb etl ipaas json markdown messaging mq mulesoft openapi rest-api testing workflow

Last synced: 05 May 2024

https://github.com/jf-tech/omniparser

omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

codeless csv delimited edi edifact etl fixed-length fixed-width golang javascript json parser schema schemas streaming transform txt x12 xml

Last synced: 04 May 2024

https://github.com/mara/mara-example-project-2

An example mini data warehouse for python project stats, template for new projects

bigquery data-integration etl pypi sql

Last synced: 04 May 2024

https://github.com/smooks/smooks

Extensible data integration Java framework for building XML and non-XML fragment-based applications

analytics big-data enterprise-integration etl event-driven java pipelines sax smooks stream-processing xml

Last synced: 02 May 2024

https://github.com/ropensci/elastic

R client for the Elasticsearch HTTP API

data-science database database-wrapper elasticsearch etl http json r r-package rstats

Last synced: 02 May 2024

https://github.com/MarcusBarnes/mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).

digital-preservation digital-repository etl islandora migration php php-cli repository-tools utility

Last synced: 01 May 2024

https://github.com/khezen/avro

Apache AVRO for go

apache avro etl go golang redshift sql

Last synced: 29 Apr 2024

https://github.com/e-alizadeh/sample_dbt_project

Companion template repo for the blog post "dbt for Data Transformation - A Hands-on Tutorial" (https://ealizadeh.com/blog/dbt-tutorial)

data-engineering data-transformation database dbt dbt-packages dbtcloud etl sql

Last synced: 28 Apr 2024

https://github.com/DAGWorks-Inc/hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.

dag data-analysis data-engineering data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hacktoberfest lineage llmops machine-learning mlops numpy orchestration pandas python software-engineering

Last synced: 28 Apr 2024

https://github.com/aws/aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

amazon-athena amazon-sagemaker-notebook apache-arrow apache-parquet athena aws aws-glue aws-lambda data-engineering data-science emr etl glue-catalog lambda modin mysql pandas python ray redshift

Last synced: 28 Apr 2024

https://github.com/PhantomInsights/baby-names-analysis

Data ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.

eda etl matplotlib numpy pandas python-requests python3 seaborn

Last synced: 27 Apr 2024

https://github.com/bpolaszek/bentools-etl

PHP ETL (Extract / Transform / Load) library with SOLID principles + almost no dependency.

callable etl export extract extractor import input invoke load loader loop output pattern php transform transformer

Last synced: 26 Apr 2024

https://github.com/blockchain-etl/ethereum-etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ

aws bigquery blockchain-analytics csv erc20 erc20-tokens erc721 ethereum etl export gcp google-cloud sql transaction

Last synced: 25 Apr 2024

https://github.com/dswarm/dswarm-backoffice-web

The backoffice web application of d:swarm (https://github.com/dswarm/dswarm-documentation/wiki)

datamanagement dswarm etl

Last synced: 21 Apr 2024

https://github.com/grofit/persistity

A persistence framework for game developers

binary etl json serialization unity unity3d xml

Last synced: 21 Apr 2024

https://github.com/stitchfix/hamilton

A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton

dag data-engineering data-platform data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hamilton hamiltonian machine-learning numpy pandas python software-engineering stitch-fix

Last synced: 20 Apr 2024

https://github.com/chronicle-app/chronicle-etl

📜 A CLI toolkit for extracting and working with your digital history

archiving chronicle chronicle-etl cli csv data-liberation etl json memex personal-archive personal-data quantified-self ruby

Last synced: 20 Apr 2024

https://github.com/turbot/steampipe-plugin-kubernetes

Use SQL to instantly query Kubernetes API resources. Open source CLI. No DB required.

backup etl hacktoberfest k8s kubernetes kubernetes-api postgresql postgresql-fdw sql sqlite steampipe steampipe-plugin zero-etl

Last synced: 18 Apr 2024

https://github.com/easysql/easy_sql

A library developed to ease the data ETL development process.

clickhouse etl postgres postgresql python spark sql

Last synced: 16 Apr 2024

https://github.com/PublicI/fec-loader

Loads raw FEC filings into a database

campaignfinance elections etl fec node

Last synced: 16 Apr 2024

https://github.com/cyber-drop/ethereum_analytical_db

Ethereum Analytical Database - Ethereum data access solution that can be used for analytics and application development. The solution works on a fast DB - Clickhouse.

api blockchain clickhouse dex erc20 erc223 erc721 eth ethereum ethereum-etl etl etl-pipeline

Last synced: 13 Apr 2024

https://github.com/data-solution-automation-engine/data-solution-framework

A library for data warehouse and data integration pattern and architecture documentation.

architecture data-warehouses datawarehouse design etl etl-control etl-processes patterns solution

Last synced: 10 Apr 2024

https://github.com/Datavault-UK/automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)

data-vault dataengineering datalake datavault datavault20 datawarehouse datawarehousing dbt elt etl metadata snowflake sql

Last synced: 10 Apr 2024

https://github.com/umitkaanusta/reddit-detective

Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more

analysis analytics api data database elt etl graph graph-database neo4j network politics reddit social social-media social-network

Last synced: 08 Apr 2024

https://github.com/toaco/carry

Python ETL(Extract-Transform-Load) tool / Data migration tool

database database-migrations datatransformer etl migration pandas python sqlalchemy

Last synced: 05 Apr 2024

https://github.com/superlinked/superlinked

A compute framework for turning complex data into vectors.

embeddings etl vector-search

Last synced: 03 Apr 2024

https://github.com/MassStreetAnalytics/etl-framework

A framework for moving data into a data warehouse.

data-warehouse etl etl-components etl-framework etl-pipeline python sql sqlserver

Last synced: 01 Apr 2024

https://github.com/stonezhong/DataManager

Better organize data in data lake and build ETL pipeline with Web UI tool.

datalake datawarehouse etl spark sparksql

Last synced: 01 Apr 2024

https://github.com/dazheng/SparkETL

Implement a complete data warehouse etl using spark SQL

datawarehouse etl spark sparksql

Last synced: 01 Apr 2024

https://github.com/zazuko/barnard59

An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.

data-integration data-pipeline data-processing etl json-ld linked-data pipeline rdf semantic-web

Last synced: 01 Apr 2024

https://github.com/datamade/data-making-guidelines

:blue_book: Making Data, the DataMade Way

datamade etl makefile principles

Last synced: 01 Apr 2024

https://github.com/BitwiseInc/Hydrograph

A visual ETL development and debugging tool for big data

apache-spark big-data cascading etl etl-framework

Last synced: 01 Apr 2024

https://github.com/level-vc/useful

The open-source Useful SDK. One python decorator in the Useful library allows for full observability of Python functions within an ETL.

etl etl-pipelines python-observability telemetry

Last synced: 01 Apr 2024

https://github.com/flock-lab/flock

Flock: A Low-Cost Streaming Query Engine on FaaS Platforms

continuous-queries etl lambda-functions olap serverless streaming

Last synced: 01 Apr 2024

https://github.com/recap-build/recap

Work with your web service, database, and streaming schemas in a single format.

data-catalog data-discovery data-engineering data-integration data-pipelines etl metadata recap

Last synced: 01 Apr 2024

https://github.com/reachsumit/MSCA-31005-database-class-project

The aim of our team was to use data from meetup.com to build a recommender system which will identify and suggest groups and activities to a member based on their interest and additional interests of similar members. Furthermore, a social network analysis was done to identify the relationship between groups and people.

aws-ec2 aws-rds collaborative-filtering database-schema etl gephi jupyter-notebook knn-classification meetup meetup-api mysql-database pymysql python-3-6 social-network-analysis tableau

Last synced: 30 Mar 2024

https://github.com/crealytics/spark-excel

A Spark plugin for reading and writing Excel files

data-frame etl excel scala spark

Last synced: 30 Mar 2024

https://github.com/php-etl/satellite

A micro-service compilation tool for data stream processing in the cloud

etl hacktoberfest php

Last synced: 29 Mar 2024

https://github.com/nodefluent/kafka-connect

equivalent to kafka-connect :wrench: for nodejs :sparkles::turtle::rocket::sparkles:

connect datastore etl framework kafka kafka-connect nodejs

Last synced: 26 Mar 2024

https://github.com/nodefluent/sequelize-kafka-connect

:gem: nodejs kafka connect connector for MySQL, Postgres, SQLite and MSSQL

etl kafka kafka-connect mssql mysql nodejs postgres sequelize sqlite

Last synced: 26 Mar 2024

https://github.com/nodefluent/bigquery-kafka-connect

:cloud: nodejs kafka connect connector for Google BigQuery

big-data bigquery connect etl google-cloud kafka kafka-connect nodejs

Last synced: 26 Mar 2024

https://github.com/datavane/tis

Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI

cdc chunjun dataops datax etl flink flink-streaming java

Last synced: 26 Mar 2024

https://github.com/astronomer/astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows

Last synced: 24 Mar 2024

https://github.com/basin-etl/basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

emr etl hadoop informatica odi pipeline pyspark spark

Last synced: 23 Mar 2024

https://github.com/react-csv/react-csv

React components to build CSV files on the fly basing on Array/literal object of data

csv-document etl excel reactjs reporting

Last synced: 22 Mar 2024

https://github.com/SouthGreenPlatform/AgroLD_ETL

AgroLD is a RDF knowledge base that consists of data integrated from a variety of plant resources and ontologies. AgroLD ETL is the Python packages developed to transform plant datasets in RDF. Packages are developped for data standards such as GFF,GAF, VCF and specific plant databases.

etl gaf gff ontologies rdf rdf-data vcf

Last synced: 21 Mar 2024

https://github.com/nl2go/hetzner-invoice

Automatically download and transform Hetzner invoices.

etl etl-automation hetzner hetzner-cloud hetzner-invoice

Last synced: 21 Mar 2024

https://github.com/awslabs/aws-serverless-data-lake-framework

Enterprise-grade, production-hardened, serverless data lake on AWS

analytics aws best-practices data-engineering data-lake etl framework iac lake-formation serverless

Last synced: 19 Mar 2024

https://github.com/hooopo/kiba-plus

Kiba enhancement for Ruby ETL.

bulk etl kiba mysql postgresql

Last synced: 19 Mar 2024

https://github.com/nerevu/riko

A Python stream processing engine modeled after Yahoo! Pipes

asynchronous cli data etl featured functional-programming library parallelism rss stream-processing

Last synced: 19 Mar 2024

https://github.com/ascrus/getl

A tool for developing and testing ETL and ELT processes for automating the capture, delivery and processing of information in data warehouses on the MicroFocus Vertica platform.

csv dsl elt etl excel hdfs hive impala json kafka sql unit-testing vertica xml

Last synced: 19 Mar 2024

https://github.com/koaning/kadro

A friendly pandas wrapper with a more composable grammar support.

etl pandas-dataframe pydata python

Last synced: 18 Mar 2024

https://github.com/kokes/od

Česká otevřená data

civic-tech etl opendata postgresql

Last synced: 16 Mar 2024

https://github.com/getdozer/dozer

Dozer is a real-time data platform for building, deploying and maintaining data products.

api apis data ethereum etl grpc low-code postgres realtime rest rest-api rust snowflake sql streaming

Last synced: 15 Mar 2024