Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with lakehouse
A curated list of projects in awesome lists tagged with lakehouse .
https://github.com/StarRocks/starrocks
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
analytics big-data cloudnative database datalake delta-lake distributed-database hudi iceberg join lakehouse lakehouse-platform mpp olap real-time-analytics real-time-updates realtime-database sql star-schema vectorized
Last synced: 30 Jul 2024
https://github.com/starrocks/starrocks
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
analytics big-data cloudnative database datalake delta-lake distributed-database hudi iceberg join lakehouse lakehouse-platform mpp olap real-time-analytics real-time-updates realtime-database sql star-schema vectorized
Last synced: 29 Sep 2024
https://github.com/lakesoul-io/LakeSoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
arrow big-data datafusion datalake flink huggingface lakehouse lakesoul postgresql python pytorch rust spark sql streaming vectorized velox
Last synced: 31 Jul 2024
https://github.com/lakesoul-io/lakesoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
arrow big-data datafusion datalake flink huggingface lakehouse lakesoul postgresql python pytorch rust spark sql streaming vectorized velox
Last synced: 28 Sep 2024
https://github.com/ytsaurus/ytsaurus
YTsaurus is a scalable and fault-tolerant open-source big data platform.
big-data clickhouse distributed-database lakehouse olap-database spark sql ytsaurus
Last synced: 28 Sep 2024
https://github.com/apache/amoro
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
Last synced: 01 Aug 2024
https://github.com/apache/gravitino
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
ai-catalog data-catalog datalake federated-query lakehouse metadata metalake model-catalog opendatacatalog skycomputing stratosphere
Last synced: 30 Sep 2024
https://github.com/apache/Gravitino
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
ai-catalog data-catalog datalake federated-query lakehouse metadata metalake model-catalog opendatacatalog skycomputing stratosphere
Last synced: 28 Sep 2024
https://github.com/cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
apache-iceberg apache-spark data-engineering data-ingestion data-integration data-lake data-pipeline data-transfer datalake delta elt etl incremental-updates lakehouse pipelines spark-sql sql upsert zeppelin-notebook
Last synced: 28 Sep 2024
https://github.com/data-dot-all/dataall
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
aws aws-glue aws-lake-formation aws-s3 data data-science etl-framework lakeformation lakehouse redshift
Last synced: 13 Aug 2024
https://github.com/databricks/terraform-databricks-examples
Examples of using Terraform to deploy Databricks resources
aws azure databricks databricks-module gcp lakehouse terraform terraform-module
Last synced: 26 Sep 2024
https://github.com/lhbench/lhbench
Lakehouse storage system benchmark
apache-hudi apache-iceberg benchmark cidr database databricks delta-lake lakehouse
Last synced: 01 Aug 2024
https://github.com/ekote/build-your-first-end-to-end-lakehouse-solution
Build Your First End-to-End Lakehouse Solution (aka.ms/fabconlake)
apache-spark data-engineering data-factory data-pipeline data-science dataflows delta-lake lakehouse machine-learning microsoft-azure microsoft-fabric parquet powerbi tutorial warehouse workshop
Last synced: 28 Sep 2024
https://github.com/adidas/lakehouse-engine-docs
The Goal of this project is to provide documentation for the Lakehouse Engine framework.
big-data data-engineering data-quality databricks delta-lake framework great-expectations lakehouse lakehouse-engine spark
Last synced: 28 Sep 2024
https://github.com/epomatti/ms-fabric-private-link
Microsoft Fabric Lakehouse configured with Private Link
compose docker fabric java jdbc lakehouse microsoft-fabric private-endpoint private-link sqlserver terraform
Last synced: 27 Sep 2024