Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with data-lake

A curated list of projects in awesome lists tagged with data-lake .

https://github.com/apache/kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

data-lake hacktoberfest hadoop hive jdbc kubernetes spark spark-sql sql thrift

Last synced: 28 Sep 2024

https://github.com/dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

data data-engineering data-lake data-loading data-warehouse elt extract load python transform

Last synced: 31 Jul 2024

https://github.com/bytedance/bitsail

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.

big-data data-integration data-lake data-pipeline data-synchronization flink high-performance real-time

Last synced: 30 Sep 2024

https://github.com/teradata/kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

data-lake hadoop kylo nifi spark teradata

Last synced: 28 Sep 2024

https://github.com/Teradata/kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

data-lake hadoop kylo nifi spark teradata

Last synced: 01 Aug 2024

https://github.com/uber/marmaray

Generic Data Ingestion & Dispersal Library for Hadoop

avro-schema data-lake hadoop ingest-data schema-format spark

Last synced: 31 Jul 2024

https://github.com/awslabs/aws-serverless-data-lake-framework

Enterprise-grade, production-hardened, serverless data lake on AWS

analytics aws best-practices data-engineering data-lake etl framework iac lake-formation serverless

Last synced: 02 Aug 2024

https://github.com/awslabs/amazon-s3-find-and-forget

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

amazon-s3 aws big-data ccpa data data-erasure data-lake gdpr parquet privacy right-to-be-forgotten s3

Last synced: 01 Aug 2024

https://github.com/maxi-k/btrblocks

BtrBlocks: Efficient Columnar Compression for Data Lakes (SIGMOD 2023 Paper)

compression data-lake databases research

Last synced: 01 Oct 2024

https://github.com/garystafford/tickit-data-lake-demo

Resources for video demonstrations and blog posts related to DataOps on AWS

airflow aws data-lake dataops devops redshift

Last synced: 01 Aug 2024

https://github.com/azure/azuredatalake

Samples and Docs for Azure Data Lake Store and Analytics

azure big-data data-lake

Last synced: 30 Sep 2024

https://github.com/smart-data-lake/smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

data-lake data-pipelines deltalake hadoop hive scala smart-data-lake spark transform-data

Last synced: 28 Sep 2024

https://github.com/camunda-community-hub/zeeqs

GraphQL API for Zeebe data

data-lake graphql zeebe zeebe-tool

Last synced: 30 Jul 2024

https://github.com/OElesin/querypal

Web UI for Amazon Athena

analytics aws aws-athena data data-lake sql

Last synced: 13 Aug 2024

https://github.com/AuFeld/Data_Engineering_Projects

A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousing, containerization, and a dashboard to monitor data pipeline KPIs

airflow aws cassandra data-engineering data-lake data-warehouse docker emr etl-pipeline infrastructure-as-code infrastructure-setup postgresql python redshift s3 spark

Last synced: 13 Aug 2024

https://github.com/apache/kyuubi-docker

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

data-lake hadoop hive jdbc kubernetes spark spark-sql sql thrift

Last synced: 30 Sep 2024

https://github.com/DataDrivenGit/Music-Streaming-App-using-AWS-ETL

Implemented Data Warehouse, Data Lake on AWS and Data modeling with Postgres and Apache Cassandra, Also used Apache Airflow to create data pipeline

airflow-operators cassandra data-lake data-pipelines datawarehouse postgres python3 sql

Last synced: 08 Aug 2024