An open API service indexing awesome lists of open source software.

Projects in Awesome Lists by duyanh711

A curated list of projects in awesome lists by duyanh711 .

https://github.com/duyanh711/tiki-recommender-system

This project implements a complete ETL (Extract - Transform - Load) pipeline to collect data from the Tiki.vn website, process it through Bronze, Silver, and Gold data layers by MinIO, and finally load it into PostgreSQL. The ultimate goal is to serve a product recommender system via an API and provide data visualizations using Apache Superset.

Last synced: 15 May 2025

https://github.com/duyanh711/robustdatapipelineformoderndataengineering

In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our cloud provider.

apache-spark azure databricks dbt medallion-architecture

Last synced: 06 Mar 2025

https://github.com/duyanh711/duyanh711

Config files for my GitHub profile.

config github-config

Last synced: 20 Feb 2025

https://github.com/duyanh711/redditdatapipelineendtoend

This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3.

airflow aws etl-pipeline

Last synced: 20 Feb 2025

https://github.com/duyanh711/socket-1

Last synced: 20 Feb 2025

https://github.com/duyanh711/navbar

Last synced: 20 Feb 2025

https://github.com/duyanh711/realtimedatastreaming

A real-time data streaming pipeline, covering each phase from data ingestion to processing and finally storage. We'll utilize a powerful stack of tools and technologies, including Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra—all neatly containerized using Docker.

apache-airflow docker kafka spark

Last synced: 20 Feb 2025

https://github.com/duyanh711/todo-list

Last synced: 20 Feb 2025