Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/Azure/data-product-batch

Template to deploy a Data Product for Batch data processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.

architecture arm azure bicep data-fabric data-integration data-mesh data-platform data-product enterprise-scale enterprise-scale-analytics policy-driven

Last synced: 21 Jun 2024

https://github.com/dataplane-app/dataplane

Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.

airflow data data-analysis data-engineering data-integration data-pipelines data-science dataplane datawarehouse etl finance golang kubernetes pipelines robotics-process-automation rpa scheduler workflow workflow-automation workflows

Last synced: 17 Jun 2024

https://github.com/bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

bigquery copy-database data-ingestion data-integration data-pipeline duckdb ingestion-pipeline mssql postgresql snowflake

Last synced: 15 Jun 2024

https://github.com/shu-hai/D-CCA

A Decomposition-based Canonical Correlation Analysis for High-dimensional Datasets (JASA-20 paper)

data-fusion data-integration high-dimensional-data integrative-analysis multiblock-structures multiview

Last synced: 09 Jun 2024

https://github.com/asmagen/robustSingleCell

Robust single cell clustering and comparison of population compositions across tissues and experimental models via similarity analysis.

clustering data-integration scrnaseq single-cell-genomics single-cell-rna-seq

Last synced: 09 Jun 2024

https://github.com/bytedance/bitsail

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.

big-data data-integration data-lake data-pipeline data-synchronization flink high-performance real-time

Last synced: 07 Jun 2024

https://github.com/artie-labs/transfer

Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.

apache-kafka bigquery cdc change-data-capture data-integration data-pipelines database debezium elt golang kafka redshift snowflake

Last synced: 07 Jun 2024

https://github.com/kuwala-io/kuwala

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times

admin-boundaries data data-integration data-science dbt elt google-trends jupyter kuwala no-code open-data open-source population postgres pyspark python react react-flow scraping spatial-analysis

Last synced: 02 Jun 2024

https://github.com/apache/incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.

dashboard-friendly data data-analysis data-engineering data-integration data-transfers devops domain-layer dora etl golang hacktoberfest integration jira open-source user-friendly

Last synced: 31 May 2024

https://github.com/apache/seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

apache batch cdc change-data-capture data-ingestion data-integration elt high-performance offline real-time streaming

Last synced: 27 May 2024

https://github.com/immunogenomics/harmony

Fast, sensitive and accurate integration of single-cell data with Harmony

algorithm data-integration r scrna-seq

Last synced: 27 May 2024

https://github.com/mara/mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

data data-integration etl pipeline postgresql python

Last synced: 18 May 2024

https://github.com/dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

analytics dagster data-engineering data-integration data-orchestrator data-pipelines data-science etl metadata mlops orchestration python scheduler workflow workflow-automation

Last synced: 16 May 2024

https://github.com/kestra-io/kestra

Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.

data data-engineering data-integration data-orchestration data-orchestrator data-pipeline data-quality elt etl low-code orchestration pipeline reverse-etl scheduler workflow workflow-engine

Last synced: 14 May 2024

https://github.com/SDM-TIB/SDM-RDFizer

An Efficient RML-Compliant Engine for Knowledge Graph Construction

data-integration knowledge-graph rml

Last synced: 12 May 2024

https://github.com/morph-kgc/morph-kgc

Powerful RDF Knowledge Graph Generation with RML Mappings

data-engineering data-integration database etl knowledge-graph python r2rml rdf rdf-star rml

Last synced: 12 May 2024

https://github.com/SDM-TIB/FunMap

Functional Mappings for Scaled-Up Knowledge Graph Creation

data-integration functions knowledge-graph rml

Last synced: 12 May 2024

https://github.com/oeg-upm/gtfs-bench

GTFS-Madrid-Bench: A Benchmark for Knowledge Graph Construction Engines

data-integration knowledge-graph obda obdi r2rml rml transport-domain

Last synced: 12 May 2024

https://github.com/airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake

Last synced: 09 May 2024

https://github.com/mara/mara-example-project-2

An example mini data warehouse for python project stats, template for new projects

bigquery data-integration etl pypi sql

Last synced: 04 May 2024

https://github.com/jitsucom/jitsu

Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

bigquery clickhouse data-collection data-connectors data-integration golang postgres redshift snowflake

Last synced: 18 Apr 2024

https://github.com/apache/seatunnel-web

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

apache data-integration data-pipeline etl-framework high-performance offline real-time seatunnel sql-engine

Last synced: 16 Apr 2024

https://github.com/opensanctions/nomenklatura

Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources

data-integration deduplication record-link

Last synced: 15 Apr 2024

https://github.com/hetio/hetionet

Hetionet: an integrative network of disease

data-integration drug-repurposing hetionet hetnet neo4j network rephetio

Last synced: 15 Apr 2024

https://github.com/umer7/Data-Warehouse-Concepts-Design-and-Data-Integration

Repo for Data Warehouse Concepts, Design, and Data Integration by University of Colorado System (coursera)(Notes,Assignments, quiz and research papers)

data-integration data-warehouse datawarehouse oracle pentaho

Last synced: 10 Apr 2024

https://github.com/zazuko/barnard59

An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.

data-integration data-pipeline data-processing etl json-ld linked-data pipeline rdf semantic-web

Last synced: 01 Apr 2024

https://github.com/recap-build/recap

Work with your web service, database, and streaming schemas in a single format.

data-catalog data-discovery data-engineering data-integration data-pipelines etl metadata recap

Last synced: 01 Apr 2024

https://github.com/apache/hudi

Upserts, Deletes And Incremental Processing on Big Data.

apacheflink apachehudi apachespark bigdata data-integration datalake hudi incremental-processing stream-processing

Last synced: 31 Mar 2024

https://github.com/CommonCoreOntology/CommonCoreOntologies

The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.

applied-ontology bfo cco data-integration interoperability ontologies ontology-suite owl-ontology semantic-consistency semantics

Last synced: 23 Mar 2024

https://github.com/DTStack/chunjun

A data integration framework

bigdata data-integration flink framework java

Last synced: 13 Mar 2024