Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/theislab/scarches
Reference mapping for single-cell genomics
batch-correction data-integration deep-learning human-cell-atlas multimodal-deep-learning multiomics rna-seq-analysis scrna-seq single-cell single-cell-genomics
Last synced: 28 Jun 2024
https://github.com/apache/flink-cdc
Flink CDC is a streaming data integration tool
batch cdc change-data-capture data-integration data-pipeline distributed elt etl flink kafka mysql paimon postgresql real-time schema-evolution
Last synced: 26 Jun 2024
https://github.com/Azure/data-product-batch
Template to deploy a Data Product for Batch data processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.
architecture arm azure bicep data-fabric data-integration data-mesh data-platform data-product enterprise-scale enterprise-scale-analytics policy-driven
Last synced: 21 Jun 2024
https://github.com/dataplane-app/dataplane
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
airflow data data-analysis data-engineering data-integration data-pipelines data-science dataplane datawarehouse etl finance golang kubernetes pipelines robotics-process-automation rpa scheduler workflow workflow-automation workflows
Last synced: 17 Jun 2024
https://github.com/genular/pandora
PANDORA - Predictive Analytics aNd Data Oriented Research Applications :computer:
bioinformatics biomarkers clinical-data clustering data-integration data-mining data-science data-visualization drug-discovery genomic-data-analysis machine-learning microbiome pandora predictive-analytics systems-biology transcriptomics tsne umap unsupervised-machine-learning
Last synced: 16 Jun 2024
https://github.com/saeyslab/nichenetr
NicheNet: predict active ligand-target links between interacting cells
cell-cell-communication data-integration gene-expression intercellular-communication ligand-receptor ligand-target network-inference rna-seq single-cell-omics single-cell-rna-seq
Last synced: 16 Jun 2024
https://github.com/bruin-data/ingestr
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
bigquery copy-database data-ingestion data-integration data-pipeline duckdb ingestion-pipeline mssql postgresql snowflake
Last synced: 15 Jun 2024
https://github.com/seandavi/awesome-single-cell
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
analysis analysis-pipeline atac-seq awesome-list bioinformatics cell-clusters cell-cycle cell-differentiation cell-populations clustering data-integration data-visualization dimensionality-reduction gene-expression gene-expression-profiles python rna-seq-data rna-seq-experiments scrna-seq-data single-cell
Last synced: 15 Jun 2024
https://github.com/apache/hop
Hop Orchestration Platform
apache data-integration etl hop java orchestration pipeline streaming workflow
Last synced: 12 Jun 2024
https://github.com/linkml/linkml-model
Link Modeling Language (LinkML) model
data-integration data-modeling graph-ql json json-schema linked-data linkml metadata metamodel schema-language semantic-web shacl shex uml yaml
Last synced: 11 Jun 2024
https://github.com/shu-hai/D-CCA
A Decomposition-based Canonical Correlation Analysis for High-dimensional Datasets (JASA-20 paper)
data-fusion data-integration high-dimensional-data integrative-analysis multiblock-structures multiview
Last synced: 09 Jun 2024
https://github.com/selbouhaddani/OmicsPLS
R package for High dimensional data analysis and integration with O2PLS!
bioinformatics biostatistics data-integration latent-variable-models multi-omics omics partial-least-squares-regression pca pls principal-component-analysis
Last synced: 09 Jun 2024
https://github.com/asmagen/robustSingleCell
Robust single cell clustering and comparison of population compositions across tissues and experimental models via similarity analysis.
clustering data-integration scrnaseq single-cell-genomics single-cell-rna-seq
Last synced: 09 Jun 2024
https://github.com/Avaiga/taipy
Turns Data and AI algorithms into production-ready web applications in no time.
automation data-engineering data-integration data-ops data-visualization datascience developer-tools hacktoberfest hacktoberfest2023 job-scheduler mlops orchestration pipeline pipelines python scenario scenario-analysis taipy-core taipy-gui workflow
Last synced: 08 Jun 2024
https://github.com/leesf/hudi-resources
汇总Apache Hudi相关资料
apache apachehudi bigdata data-integration datalake hudi hudi-resources incremental-processing stream-processing
Last synced: 07 Jun 2024
https://github.com/bytedance/bitsail
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
big-data data-integration data-lake data-pipeline data-synchronization flink high-performance real-time
Last synced: 07 Jun 2024
https://github.com/artie-labs/transfer
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.
apache-kafka bigquery cdc change-data-capture data-integration data-pipelines database debezium elt golang kafka redshift snowflake
Last synced: 07 Jun 2024
https://github.com/kuwala-io/kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
admin-boundaries data data-integration data-science dbt elt google-trends jupyter kuwala no-code open-data open-source population postgres pyspark python react react-flow scraping spatial-analysis
Last synced: 02 Jun 2024
https://github.com/apache/incubator-devlake
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
dashboard-friendly data data-analysis data-engineering data-integration data-transfers devops domain-layer dora etl golang hacktoberfest integration jira open-source user-friendly
Last synced: 31 May 2024
https://github.com/apache/seatunnel
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
apache batch cdc change-data-capture data-ingestion data-integration elt high-performance offline real-time streaming
Last synced: 27 May 2024
https://github.com/immunogenomics/harmony
Fast, sensitive and accurate integration of single-cell data with Harmony
algorithm data-integration r scrna-seq
Last synced: 27 May 2024
https://github.com/mara/mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
data data-integration etl pipeline postgresql python
Last synced: 18 May 2024
https://github.com/dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
analytics dagster data-engineering data-integration data-orchestrator data-pipelines data-science etl metadata mlops orchestration python scheduler workflow workflow-automation
Last synced: 16 May 2024
https://github.com/rudderlabs/rudder-server
Privacy and Security focused Segment-alternative, in Golang and React
bigquery customer-data customer-data-lake customer-data-pipeline customer-data-platform data-integration data-pipeline data-synchronization data-warehouse etl golang hybrid-cloud privacy redshift rudderstack security segment-alternative snowflake warehouse-first warehouse-management
Last synced: 16 May 2024
https://github.com/kestra-io/kestra
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
data data-engineering data-integration data-orchestration data-orchestrator data-pipeline data-quality elt etl low-code orchestration pipeline reverse-etl scheduler workflow workflow-engine
Last synced: 14 May 2024
https://github.com/SDM-TIB/SDM-RDFizer
An Efficient RML-Compliant Engine for Knowledge Graph Construction
data-integration knowledge-graph rml
Last synced: 12 May 2024
https://github.com/morph-kgc/morph-kgc
Powerful RDF Knowledge Graph Generation with RML Mappings
data-engineering data-integration database etl knowledge-graph python r2rml rdf rdf-star rml
Last synced: 12 May 2024
https://github.com/SDM-TIB/FunMap
Functional Mappings for Scaled-Up Knowledge Graph Creation
data-integration functions knowledge-graph rml
Last synced: 12 May 2024
https://github.com/oeg-upm/gtfs-bench
GTFS-Madrid-Bench: A Benchmark for Knowledge Graph Construction Engines
data-integration knowledge-graph obda obdi r2rml rml transport-domain
Last synced: 12 May 2024
https://github.com/airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake
Last synced: 09 May 2024
https://github.com/cloudquery/cloudquery
The open source high performance ELT framework powered by Apache Arrow
airbyte attack-surface-management aws azure bigquery cspm data data-analysis data-collection data-engineering data-integration elt etl etl-framework gcp github-api go google kubernetes sql
Last synced: 08 May 2024
https://github.com/mara/mara-example-project-2
An example mini data warehouse for python project stats, template for new projects
bigquery data-integration etl pypi sql
Last synced: 04 May 2024
https://github.com/infinyon/fluvio
Lean and mean distributed stream processing system written in rust and web assembly.
cloud-native data-flow data-integration data-pipelines distributed-systems event-driven-architecture real-time rust serverless stateful stream-processing stream-processing-engine streaming streaming-data streaming-data-pipelines streaming-data-processing webassembly
Last synced: 29 Apr 2024
https://github.com/apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
airflow apache apache-airflow automation dag data-engineering data-integration data-orchestrator data-pipelines data-science elt etl machine-learning mlops orchestration python scheduler workflow workflow-engine workflow-orchestration
Last synced: 26 Apr 2024
https://github.com/ceumicrodata/mETL
mito ETL tool
data-integration etl etl-framework pipeline python
Last synced: 22 Apr 2024
https://github.com/mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
artificial-intelligence data data-engineering data-integration data-pipelines data-science dbt elt etl machine-learning orchestration pipeline pipelines python reverse-etl spark sql transformation
Last synced: 20 Apr 2024
https://github.com/jitsucom/jitsu
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
bigquery clickhouse data-collection data-connectors data-integration golang postgres redshift snowflake
Last synced: 18 Apr 2024
https://github.com/apache/seatunnel-web
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
apache data-integration data-pipeline etl-framework high-performance offline real-time seatunnel sql-engine
Last synced: 16 Apr 2024
https://github.com/opensanctions/nomenklatura
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
data-integration deduplication record-link
Last synced: 15 Apr 2024
https://github.com/hetio/hetionet
Hetionet: an integrative network of disease
data-integration drug-repurposing hetionet hetnet neo4j network rephetio
Last synced: 15 Apr 2024
https://github.com/umer7/Data-Warehouse-Concepts-Design-and-Data-Integration
Repo for Data Warehouse Concepts, Design, and Data Integration by University of Colorado System (coursera)(Notes,Assignments, quiz and research papers)
data-integration data-warehouse datawarehouse oracle pentaho
Last synced: 10 Apr 2024
https://github.com/ojasphansekar/Data-Warehouse-and-Business-Intelligence
SSIS, Talend, Tableau, Power BI, PostgreSQL, MySQL, Oracle, SQL Server, Toad Data Modeler
azure-sql-database business-intelligence data-analysis data-conversion data-ingestion data-integration data-mapping data-visualization excel mysql-database oracle-18c postgresql-database power-bi sql-server-database sql-server-management-studio ssis-packages tableau-desktop talend-dataintegration toad-modeler
Last synced: 10 Apr 2024
https://github.com/zazuko/barnard59
An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.
data-integration data-pipeline data-processing etl json-ld linked-data pipeline rdf semantic-web
Last synced: 01 Apr 2024
https://github.com/recap-build/recap
Work with your web service, database, and streaming schemas in a single format.
data-catalog data-discovery data-engineering data-integration data-pipelines etl metadata recap
Last synced: 01 Apr 2024
https://github.com/apache/hudi
Upserts, Deletes And Incremental Processing on Big Data.
apacheflink apachehudi apachespark bigdata data-integration datalake hudi incremental-processing stream-processing
Last synced: 31 Mar 2024
https://github.com/CommonCoreOntology/CommonCoreOntologies
The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.
applied-ontology bfo cco data-integration interoperability ontologies ontology-suite owl-ontology semantic-consistency semantics
Last synced: 23 Mar 2024
https://github.com/DTStack/chunjun
A data integration framework
bigdata data-integration flink framework java
Last synced: 13 Mar 2024