Projects in Awesome Lists tagged with data-integration
A curated list of projects in awesome lists tagged with data-integration .
https://github.com/apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
airflow apache apache-airflow automation dag data-engineering data-integration data-orchestrator data-pipelines data-science elt etl machine-learning mlops orchestration python scheduler workflow workflow-engine workflow-orchestration
Last synced: 09 Sep 2025
https://github.com/airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake
Last synced: 09 Sep 2025
https://github.com/avaiga/taipy
Turns Data and AI algorithms into production-ready web applications in no time.
automation data-engineering data-integration data-ops data-visualization datascience developer-tools hacktoberfest hacktoberfest2023 job-scheduler mlops orchestration pipeline pipelines python scenario scenario-analysis taipy-core taipy-gui workflow
Last synced: 09 Sep 2025
https://github.com/Avaiga/taipy
Turns Data and AI algorithms into production-ready web applications in no time.
automation data-engineering data-integration data-ops data-visualization datascience developer-tools hacktoberfest hacktoberfest2023 job-scheduler mlops orchestration pipeline pipelines python scenario scenario-analysis taipy-core taipy-gui workflow
Last synced: 05 Apr 2025
https://github.com/dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
analytics dagster data-engineering data-integration data-orchestrator data-pipelines data-science etl metadata mlops orchestration python scheduler workflow workflow-automation
Last synced: 29 Dec 2025
https://github.com/apache/seatunnel
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
apache batch cdc change-data-capture data-ingestion data-integration elt high-performance offline real-time streaming
Last synced: 12 May 2025
https://github.com/mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
artificial-intelligence data data-engineering data-integration data-pipelines data-science dbt elt etl machine-learning orchestration pipeline pipelines python reverse-etl spark sql transformation
Last synced: 13 May 2025
https://github.com/cloudquery/cloudquery
The developer first cloud governance platform
airbyte attack-surface-management aws azure bigquery cspm data data-analysis data-collection data-engineering data-integration elt etl etl-framework gcp github-api go google kubernetes sql
Last synced: 14 May 2025
https://github.com/apache/flink-cdc
Flink CDC is a streaming data integration tool
batch cdc change-data-capture data-integration data-pipeline distributed elt etl flink kafka mysql paimon postgresql real-time schema-evolution
Last synced: 12 May 2025
https://github.com/apache/hudi
Upserts, Deletes And Incremental Processing on Big Data.
apacheflink apachehudi apachespark bigdata data-integration datalake hudi incremental-processing stream-processing
Last synced: 12 May 2025
https://github.com/infinyon/fluvio
🦀 event stream processing for developers to stream and process data in motion to power responsive data intensive applications.
cloud-native data-analytics data-flow data-integration data-pipelines distributed-systems event-driven-architecture real-time rust serverless stateful stream-processing stream-processing-engine streaming streaming-analytics streaming-data streaming-data-pipelines streaming-data-processing webassembly
Last synced: 13 May 2025
https://github.com/jitsucom/jitsu
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
bigquery clickhouse data-collection data-connectors data-integration golang postgres redshift snowflake
Last synced: 11 May 2025
https://github.com/rudderlabs/rudder-server
Privacy and Security focused Segment-alternative, in Golang and React
bigquery cdp customer-data customer-data-lake customer-data-pipeline customer-data-platform data-engineering data-integration data-pipeline data-synchronization data-warehouse elt etl event-streaming privacy redshift segment-alternative snowflake warehouse-management warehouse-native
Last synced: 13 May 2025
https://github.com/dtstack/chunjun
A data integration framework
bigdata data-integration flink framework java
Last synced: 13 May 2025
https://github.com/DTStack/chunjun
A data integration framework
bigdata data-integration flink framework java
Last synced: 14 Mar 2025
https://github.com/bruin-data/ingestr
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
bigquery copy-database data-ingestion data-integration data-pipeline duckdb ingestion-pipeline mssql postgresql snowflake
Last synced: 13 May 2025
https://github.com/apache/incubator-devlake
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
dashboard-friendly data data-analysis data-engineering data-integration data-transfers devops domain-layer dora etl golang hacktoberfest integration jira open-source user-friendly
Last synced: 14 May 2025
https://github.com/mara/mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
data data-integration etl pipeline postgresql python
Last synced: 14 May 2025
https://github.com/bytedance/bitsail
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
big-data data-integration data-lake data-pipeline data-synchronization flink high-performance real-time
Last synced: 15 May 2025
https://github.com/apache/hop
Hop Orchestration Platform
apache data-integration etl hop java orchestration pipeline streaming workflow
Last synced: 14 May 2025
https://github.com/heathersherry/Knowledge-Graph-Tutorials-and-Papers
Insightful Tutorials and Papers about Knowledge Graphs
awesome awesome-kg awesome-list data-integration entity-linking information-extration kg kgqa knowledge-base knowledge-graph knowledge-graph-completion knowledge-graph-construction knowledge-graph-embedding knowledge-graph-for-recommendation knowledge-graph-question-answering knowledge-graph-reasoning knowledge-graph-representation knowledge-graphs mmkg
Last synced: 29 Nov 2025
https://github.com/kuwala-io/kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
admin-boundaries data data-integration data-science dbt elt google-trends jupyter kuwala no-code open-data open-source population postgres pyspark python react react-flow scraping spatial-analysis
Last synced: 30 Mar 2025
https://github.com/artie-labs/transfer
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
apache-kafka bigquery cdc change-data-capture data-integration data-pipelines database debezium elt golang kafka redshift snowflake
Last synced: 28 Dec 2025
https://github.com/apache/seatunnel-web
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
apache data-integration data-pipeline etl-framework high-performance offline real-time seatunnel sql-engine
Last synced: 14 May 2025
https://github.com/immunogenomics/harmony
Fast, sensitive and accurate integration of single-cell data with Harmony
algorithm data-integration r scrna-seq
Last synced: 11 May 2025
https://github.com/leesf/hudi-resources
汇总Apache Hudi相关资料
apache apachehudi bigdata data-integration datalake hudi hudi-resources incremental-processing stream-processing
Last synced: 27 Mar 2025
https://github.com/ConduitIO/conduit
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
conduit data-engineering data-integration data-pipeline data-stream etl go kafka kafkaconnect
Last synced: 15 Jul 2025
https://github.com/saeyslab/nichenetr
NicheNet: predict active ligand-target links between interacting cells
cell-cell-communication data-integration gene-expression intercellular-communication ligand-receptor ligand-target network-inference rna-seq single-cell-omics single-cell-rna-seq
Last synced: 15 Dec 2025
https://github.com/conduitio/conduit
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
conduit data-engineering data-integration data-pipeline data-stream etl go kafka kafkaconnect
Last synced: 02 Jan 2026
https://github.com/theislab/scarches
Reference mapping for single-cell genomics
batch-correction data-integration deep-learning human-cell-atlas multimodal-deep-learning multiomics rna-seq-analysis scrna-seq single-cell single-cell-genomics
Last synced: 11 Oct 2025
https://github.com/graphform/swim-rust
Self-contained distributed software platform for building stateful, massively real-time streaming applications in Rust.
actor-model async data-integration decentralized-applications distributed-systems framework kafka real-time rust serverless stateful stream-processing streaming streaming-data-pipelines web
Last synced: 29 Jul 2025
https://github.com/gabledata/recap
Work with your web service, database, and streaming schemas in a single format.
data-catalog data-discovery data-engineering data-integration data-pipelines etl metadata recap
Last synced: 13 Dec 2025
https://github.com/cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
apache-iceberg apache-spark data-engineering data-ingestion data-integration data-lake data-pipeline data-transfer datalake delta elt etl incremental-updates lakehouse pipelines spark-sql sql upsert zeppelin-notebook
Last synced: 07 Apr 2025
https://github.com/CommonCoreOntology/CommonCoreOntologies
The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.
applied-ontology bfo cco data-integration interoperability ontologies ontology-suite owl-ontology semantic-consistency semantics
Last synced: 16 Nov 2025
https://github.com/hetio/hetionet
Hetionet: an integrative network of disease
data-integration drug-repurposing hetionet hetnet neo4j network rephetio
Last synced: 07 Apr 2025
https://github.com/slowkow/harmonypy
🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.
bioinformatics data-integration data-science single-cell-analysis
Last synced: 07 Oct 2025
https://github.com/dataplane-app/dataplane
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
airflow data data-analysis data-engineering data-integration data-pipelines data-science dataplane datawarehouse etl finance golang kubernetes pipelines robotics-process-automation rpa scheduler workflow workflow-automation workflows
Last synced: 27 Dec 2025
https://github.com/morph-kgc/morph-kgc
Powerful RDF Knowledge Graph Generation with RML Mappings
data-engineering data-integration database etl knowledge-graph python r2rml rdf rdf-star rml
Last synced: 11 May 2025
https://github.com/opensanctions/nomenklatura
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
data-integration deduplication record-link
Last synced: 30 Dec 2025
https://github.com/mara/mara-example-project-2
An example mini data warehouse for python project stats, template for new projects
bigquery data-integration etl pypi sql
Last synced: 25 Oct 2025
https://github.com/ceumicrodata/mETL
mito ETL tool
data-integration etl etl-framework pipeline python
Last synced: 07 May 2025
https://github.com/genular/pandora
PANDORA - Predictive Analytics aNd Data Oriented Research Applications :computer:
bioinformatics biomarkers clinical-data clustering data-integration data-mining data-science data-visualization drug-discovery genomic-data-analysis machine-learning microbiome pandora predictive-analytics systems-biology transcriptomics tsne umap unsupervised-machine-learning
Last synced: 03 Apr 2025
https://github.com/google/megalista
First Party data integration solution built for marketing teams to enable audience and conversion onboarding into Google Marketing products (Google Ads, Campaign Manager, Google Analytics).
audience-targeting audiences bigquery conversions customermatch data-integration dataflow google googleads googleanalytics python
Last synced: 24 Sep 2025
https://github.com/SDM-TIB/SDM-RDFizer
An Efficient RML-Compliant Engine for Knowledge Graph Construction
data-integration knowledge-graph rml
Last synced: 11 May 2025
https://github.com/starlake-ai/starlake
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
bigquery data-engineering data-integration data-pipeline etl hdfs redshift snowflake spark synapse
Last synced: 05 Apr 2025
https://github.com/sysbiochalmers/gecko
Toolbox for including enzyme constraints on a genome-scale model.
data-integration enzyme-constraints kinetics matlab proteomics systems-biology toolbox
Last synced: 24 Oct 2025
https://github.com/munchy-bytes/schemamapper
A .NET class library that allows you to import data from different sources into a unified destination
csharp csv data-import data-integration databases excel html json msaccess mysql oracle powerpoint schema-mapping schema-matching sql-server sqlce sqlite tabular-data vcard xml
Last synced: 17 Aug 2025
https://github.com/saezlab/cosmosr
COSMOS (Causal Oriented Search of Multi-Omic Space) is a method that integrates phosphoproteomics, transcriptomics, and metabolomics data sets.
data-integration metabolomic-data network-modelling phosphoproteomics proteomics transcriptomics
Last synced: 09 Apr 2025
https://github.com/buildersoftio/cortex
Cortex | Data Framework—a cutting-edge SDK that simplifies real-time data processing with intuitive operators, robust state management, and seamless telemetry for efficient, scalable pipelines.
ai csharp data-engineering data-integration data-pipeline dotnet event-driven framework machine-learning real-time streaming
Last synced: 30 Aug 2025
https://github.com/datasphere-oss/datasphere-integration
an data-centric integration platform
data-bus data-integration data-interchange data-sharing elt esb etl kettle realtime-messaging
Last synced: 14 Jul 2025
https://github.com/linkml/linkml-model
Link Modeling Language (LinkML) model
data-integration data-modeling graph-ql json json-schema linked-data linkml metadata metamodel schema-language semantic-web shacl shex uml yaml
Last synced: 05 Apr 2025
https://github.com/umer7/Data-Warehouse-Concepts-Design-and-Data-Integration
Repo for Data Warehouse Concepts, Design, and Data Integration by University of Colorado System (coursera)(Notes,Assignments, quiz and research papers)
data-integration data-warehouse datawarehouse oracle pentaho
Last synced: 20 Jul 2025
https://github.com/azure/data-product-batch
Template to deploy a Data Product for Batch data processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.
architecture arm azure bicep data-fabric data-integration data-mesh data-platform data-product enterprise-scale enterprise-scale-analytics policy-driven
Last synced: 23 Jul 2025
https://github.com/Azure/data-product-batch
Template to deploy a Data Product for Batch data processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.
architecture arm azure bicep data-fabric data-integration data-mesh data-platform data-product enterprise-scale enterprise-scale-analytics policy-driven
Last synced: 05 May 2025
https://github.com/mara/mara-etl-tools
Utilities for creating ETL pipelines with mara
data-integration date-dimension etl sql sql-utils
Last synced: 28 Feb 2025
https://github.com/altschulerwu-lab/muse
MUSE is a deep learning approach characterizing tissue composition through combined analysis of morphologies and transcriptional states for spatially resolved transcriptomics data.
clustering data-integration deep-learning multi-modal-analysis single-cell-ananlysis spatial-transcriptomics tensorflow
Last synced: 14 Dec 2025
https://github.com/artie-labs/reader
Perform historical snapshots without database locks and read change data capture logs from databases. Artie Reader is compatible with Debezium and is written in Go.
apache-kafka cdc change-data-capture data-integration database debezium golang kafka
Last synced: 16 May 2025
https://github.com/selbouhaddani/OmicsPLS
R package for High dimensional data analysis and integration with O2PLS!
bioinformatics biostatistics data-integration latent-variable-models multi-omics omics partial-least-squares-regression pca pls principal-component-analysis
Last synced: 13 Apr 2025
https://github.com/linkedin/data-integration-library
The Data Integration Library project provides a library of generic components based on a multi-stage architecture for data ingress and egress.
data-egress data-ingest data-ingestion data-integration gobblin
Last synced: 17 Aug 2025
https://github.com/dhimmel/integrate
Scripts and resources to create Hetionet v1.0, a heterogeneous network for drug repurposing
data-integration drug-repurposing hetionet hetnet neo4j network rephetio
Last synced: 12 Apr 2025
https://github.com/derwenai/erkg
Demonstrate integration of Senzing and Neo4j to construct an Entity Resolved Knowledge Graph
compliance cypher data-integration entity-resolved-knowlege-graph entity-resoultion graph-analytics graph-data-science graph-database graph-visualization knowledge-graph neo4j open-data record-linking safegraph senzing-community
Last synced: 02 May 2025
https://github.com/cthoyt/doctoral-thesis
📖 Generation and Applications of Knowledge Graphs in Systems and Networks Biology
biocuration biological-expression-language computational-biology curation data-integration knowledge-graph-embeddings knowledge-graphs networks-biology simulation systems-biology
Last synced: 03 Jan 2026
https://github.com/jonnytran/openomics
A bioinformatics API to interface with public multi-omics bio databases for wicked fast data integration.
data-integration data-manipulation genomics multi-omics python
Last synced: 16 Mar 2025
https://github.com/raamana/pyradigm
Research data management in biomedical and machine learning applications
automation biomedical data-integration data-structures datascience datastructures machine-learning machine-learning-workflows medical-data neuroimaging neuroinformatics neuroscience pandas python userfriendly workflow
Last synced: 24 Oct 2025
https://github.com/JonnyTran/OpenOmics
A bioinformatics API to interface with public multi-omics bio databases for wicked fast data integration.
data-integration data-manipulation genomics multi-omics python
Last synced: 18 Mar 2025
https://github.com/zazuko/barnard59
An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.
data-integration data-pipeline data-processing etl json-ld linked-data pipeline rdf semantic-web
Last synced: 06 Apr 2025
https://github.com/ginkgobioworks/geckopy
Enzyme-constrained genome-scale models in python
data-integration enzyme-constraints kinetics omics proteomics systems-biology thermodynamics
Last synced: 25 Oct 2025
https://github.com/dosorio/rpanglaodb
An R package to download and merge labeled single-cell RNA-seq data from the PanglaoDB database into a Seurat object.
data-integration data-mining rna-seq single-cell single-cell-rna-seq
Last synced: 22 Oct 2025
https://github.com/cloudquery/plugin-sdk
CloudQuery Go SDK for source and destination plugins
cloudquery data-integration elt
Last synced: 05 Apr 2025
https://github.com/amine-smahi/r-learning-journey
Some of the projects i made when starting to learn R for Data Science at the university
afc cpa data-cleaning data-integration data-science datascience r r-language
Last synced: 18 Mar 2025
https://github.com/davidfoerster/schema-matching
Match schema attributes of relational databases by value similarity. As a study assignment, this isn't well documented, but you can contact me for questions and I may even add docs, if I sense enough interest.
data-integration python schema-matching
Last synced: 24 Apr 2025
https://github.com/scify/jedai-ui
UI for JedAI Toolkit
data-integration entity-resolution javafx jedai toolkit user-interface
Last synced: 12 Apr 2025
https://github.com/oeg-upm/gtfs-bench
GTFS-Madrid-Bench: A Benchmark for Knowledge Graph Construction Engines
data-integration knowledge-graph obda obdi r2rml rml transport-domain
Last synced: 26 Dec 2025
https://github.com/nyxflower/gripnet
GripNet: Graph Information Propagation on Supergraph for Heterogeneous Graphs (PatternRecognit, 2023)
data-integration graph-neural-networks heterogeneous-graph interconnected-graph link-prediction node-classification pytorch
Last synced: 23 Mar 2025
https://github.com/alexkychen/assignpop
Population Assignment using Genetic, Non-genetic or Integrated Data in a Machine-learning Framework. Methods in Ecology and Evolution. 2018;9:439–446.
cross-validation data-integration gbs machine-learning population-assignment population-genomics r radseq
Last synced: 14 Apr 2025
https://github.com/cutterkom/remove-na-lgbtiq-queer-knowledge-graph
A knowledge graph on queer history
data-integration entity-linking knowledge-graph lgbt wikidata
Last synced: 27 Apr 2025
https://github.com/karrlab/datanator
Toolkit for discovering and aggregating data for whole-cell modeling
cells data-aggregation data-discovery data-integration mathematical-modeling systems-biology
Last synced: 02 Sep 2025
https://github.com/asmagen/robustsinglecell
Robust single cell clustering and comparison of population compositions across tissues and experimental models via similarity analysis.
clustering data-integration scrnaseq single-cell-genomics single-cell-rna-seq
Last synced: 23 Mar 2025
https://github.com/meltanolabs/singer-working-group
Working group for ongoing development and iteration of the Singer Spec, the de-facto protocol for open source data connectors. Please use "Issues" to create discussion items - or use "Discussions" for general questions.
data-integration dataops elt etl etl-pipeline singer
Last synced: 19 Feb 2025
https://github.com/asmagen/robustSingleCell
Robust single cell clustering and comparison of population compositions across tissues and experimental models via similarity analysis.
clustering data-integration scrnaseq single-cell-genomics single-cell-rna-seq
Last synced: 08 Apr 2025
https://github.com/lisad/phaser
The missing layer for complex data batch integration pipelines
data data-integration etl etl-pipeline
Last synced: 23 Apr 2025
https://github.com/cognitedata/python-extractor-utils
Framework for developing extractors in Python
cognite-data-fusion cognite-extractor data-integration python
Last synced: 05 Jul 2025
https://github.com/shu-hai/D-CCA
A Decomposition-based Canonical Correlation Analysis for High-dimensional Datasets (JASA-20 paper)
data-fusion data-integration high-dimensional-data integrative-analysis multiblock-structures multiview
Last synced: 13 Apr 2025
https://github.com/dachafra/thesis
PhD thesis: "Knowledge Graph Construction from Heterogeneous Data Sources exploiting Declarative Mapping Rules"
benchmarking data-integration knowledge-graph r2rml rml
Last synced: 04 Jan 2026
https://github.com/oeg-upm/morph-graphql
Translate OBDA mappings into GraphQL Servers
data-integration graphql semantic-web
Last synced: 02 Aug 2025
https://github.com/cloudformations/cf.cumulus
A cloud data platform product to accelerate time to insights. Our open-source framework is designed for the real world. Stripping away the complexity, giving you the power to build, scale, and manage your dataflows with ease, accelerating data delivery.
accelerator cfcumulus cloudformations control data-insights data-integration framework ingest metadata pipeline transform
Last synced: 05 Apr 2025
https://github.com/sysbiochalmers/orthomics
Collection of scripts for gene age sorting and multi-omics data mining and analysis
data-integration data-visualization de-analysis orthology proteomics rnaseq
Last synced: 29 Jul 2025
https://github.com/usc-isi-i2/linked-maps
Framework to build linked spatio-temporal data from vectorized evolutionary topographic map archives
data-integration digital-humanities historical-maps knowledge-graphs linked-data linked-spatio-temporal-data semantic-web spatio-temporal-data spatio-temporal-knowledge-graphs
Last synced: 26 Jun 2025
https://github.com/kinto-technologies/springboot3batchstarter
Spring Batch 5 skeleton for Spring Boot 3. Includes DB to CSV and CSV to DB samples for quick customization. This repository demonstrates multi-database setup, efficient batch processing, and GitHub Actions integration for CI/CD pipelines.
chunk ci-cd csv data-integration database-migration datasource docker github-actions h2 java job-configuration jooq multi-database mysql opencsv skeleton-code spring-batch-5 spring-boot-3 spring-framework tasklet
Last synced: 02 Nov 2025
https://github.com/dobraczka/forayer
forayer is a library of first aid utilities for knowledge graph exploration with an entity centric approach.
data-integration entity-resolution knowledge-graph
Last synced: 25 Jun 2025
https://github.com/sbl-sdsc/kg-import
kg-import automates the ingestion of heterogeneous datasets into a Knowledge Graph.
data-ingestion data-integration datasets-preparation knowledge-graph neo4j property-graph
Last synced: 12 Apr 2025
https://github.com/ronpinkas/dbbridge
dbBridge is an 'SQL Migration Tool' - enabling import of SQL Databases from any supported Dialect (MsSql, MySql, Oracle, PostgreSQL, Sqlite) to any of these supported dialects with just three lines of PHP code.
data-integration data-migration data-transfer data-transformation database-conversion db-migrate db-migration etl migration minimal mssql mysql open-source oracle php postgresql simple sql sqlite
Last synced: 30 Apr 2025
https://github.com/dobraczka/klinker
🧱 blocking methods for entity resolution
blocking data-integration deduplication entity-alignment entity-resolution link-discovery record-linkage
Last synced: 19 Apr 2025
https://github.com/ojasphansekar/Data-Warehouse-and-Business-Intelligence
SSIS, Talend, Tableau, Power BI, PostgreSQL, MySQL, Oracle, SQL Server, Toad Data Modeler
azure-sql-database business-intelligence data-analysis data-conversion data-ingestion data-integration data-mapping data-visualization excel mysql-database oracle-18c postgresql-database power-bi sql-server-database sql-server-management-studio ssis-packages tableau-desktop talend-dataintegration toad-modeler
Last synced: 20 Jul 2025
https://github.com/tteofili/certa
CERTA - Computing Entity Resolution explanations with TriAngles
data-integration entity-matching entity-resolution explainable-ai machine-learning python record-linkage xai
Last synced: 12 Apr 2025
https://github.com/vida-nyu/magneto-matcher
Repository for developing and evaluating components and algorithms for data integration tasks
bdf-toolbox data-integration schema-matching
Last synced: 14 Dec 2025
https://github.com/frankkramer-lab/multipath
Integrating pathways and related knowledge in a multilayer framework
biological-pathways biopax-encoded-pathways data-integration data-visualization drugbank-database graph-theory knowledge-representation multilayer-networks reproducibility uniprot
Last synced: 07 Sep 2025
https://github.com/drsnowbird/denodo-vnc-docker
Denodo Platform 7 (Express) in VNC / noVNC for Container Platform (Openshift, Kubernetes, DC/OS, Mesosphere, etc)
data-integration data-virtualization denodo-express vnc-docker
Last synced: 10 Apr 2025
https://github.com/marcosmarxm/awesome-airbyte
Curated list of resources about Airbyte
airbyte airbytehq connectors data-integration
Last synced: 07 Dec 2025
https://github.com/firelink-sh/evolve-py
A highly efficient, composable, and lightweight ETL and data integration framework
analytics arrow big-data data data-engineering data-integration data-science duckdb elt etl ingestion ingress ml olap pipeline polars postgresql python s3
Last synced: 16 Sep 2025