An open API service indexing awesome lists of open source software.

Projects in Awesome Lists by longNguyen010203

A curated list of projects in awesome lists by longNguyen010203 .

https://github.com/longnguyen010203/youtube-recommend-master-etl-pipeline

πŸ’œπŸŒˆπŸ“Š A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api 🌺

cleaning-data dagster data-engineering data-engineering-pipeline dbt docker docker-compose dockerfile etl-pipeline metabase minio mysql polars postgresql processing pyspark spark streamlit youtube youtube-api

Last synced: 22 Nov 2024

https://github.com/longnguyen010203/fde-course-2024-w4-dbt

πŸ’»πŸ’›Fundamental Data Engineering Course 2024 Week4 Learn DBT Transform Data with Models, Macro, ELT-Pipeline with Dagster 🌎

dagster dbt docker docker-compose dockerfile elt-pipeline mysql pandas postgresql

Last synced: 19 Apr 2025

https://github.com/longnguyen010203/ecommerce-elt-pipeline

πŸŒ„πŸ“ˆπŸ“‰ A Data Engineering Project 🌈 that implements an ELT data pipeline using Dagster, Docker, Dbt, Polars, Snowflake, PostgreSQL. Data from kaggle website πŸ”₯

dagster data data-engineering dbt docker docker-compose dockerfile elt elt-pipeline extract kaggle load polars postgresql raw-data relational-databases snowflake transform

Last synced: 23 Jan 2025

https://github.com/longnguyen010203/inspireai-web-2024

πŸ€–πŸ’ŽπŸ“Ί This project involves creating an AI chatbot with OpenAI using ChatGPT, DALL-E, Codex, and Django to develop the web application 🍁

ai chatbot chatgpt35-turbo codex css dall-e django docker html javascript openai openai-api postgresql python

Last synced: 19 Apr 2025

https://github.com/longnguyen010203/zillow-home-value-prediction

πŸŒˆπŸ“ŠπŸ“ˆ The Zillow Home Value Prediction project employs linear regression models on Kaggle datasets to forecast house prices. πŸ“‰πŸ’°Using Apache Spark (PySpark) within a Docker setup enables efficient data preprocessing, exploration, analysis, visualization, and model building with distributed computing for parallel computation.

analysis apache-spark distributed-computing docker docker-compose feature-engineering jupyter-notebook jupyterlab linear-regression machine-learning models parallel-computing prediction-model preprocessing pyspark visualization

Last synced: 27 Mar 2025

https://github.com/longnguyen010203/100day-self-learning-de

πŸ“šπŸ’»βŒ¨ Self-study process for more than 3 months with 3-4h/day to prepare for the journey of applying for an intern or fresher position as a Data Engineer in 2024 ️πŸ₯‡οΈπŸ†

data-engineer data-engineering self-learning

Last synced: 22 Nov 2024

https://github.com/longnguyen010203/spark-kafka-self-learning

πŸ“šπŸŒŠπŸŽ“ A third-year student is self-studying Spark and Kafka as part of their πŸ‘· data engineering journey, with the goal of securing an πŸ“¬ internship or fresher job in 2024.

apache-kafka apache-spark cluster docker docker-compose zookeeper

Last synced: 14 Feb 2025

https://github.com/longnguyen010203/bank-datawarehouse

πŸ“ŠπŸŒˆπŸ› This project develop a data warehouse for a bank using Amazon Redshift, VPC, Glue, S3 and DBT, following a ⭐ Star Schema architecture. The goal is to storage, manage, and optimize data to support decision making and reporting 🏡️

amazon amazon-web-services aws banking bronze dbeaver dbt dimensions facts glue gold redshift s3 security-group silver sql starschema vpc vpc-subnet warehouse

Last synced: 16 Mar 2025

https://github.com/longnguyen010203/longnguyen--aws-trainning--2024

☁️🌈πŸ”₯ Welcome to my AWS Cloud Training repository! This repo contains notes, exercises, and projects from my AWS Cloud training journey, showcasing my progress and understanding of AWS services. πŸ’¨

aws bootcamp cloud markdown terraform workshop

Last synced: 22 Nov 2024

https://github.com/longnguyen010203/spark-processing-aws

πŸ‘·πŸŒ‡ Set up and build a big data processing pipeline with Apache Spark, πŸ“¦ AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflowsπŸ₯Š

apache-airflow apache-spark aws aws-ec2 aws-s3 aws-services cloud-computing data-pipeline emr-cluster iam pyspark redshift spark-cluster spark-master spark-worker terraform

Last synced: 11 Mar 2025

https://github.com/longnguyen010203/data-warehouse-accident-us-2016-2023

Design and implement a data warehouse to manage automobile accident cases across all 49 states in the US, using a star schema and Snowflake for the data warehouse architecture.

apache-airflow apache-spark data-ingestion data-processing data-quality-checks data-transformation data-warehouse dbt decorators-python dimensions docker docker-compose dockerfile fastapi minio powerbi pyspark snowflake star-schema

Last synced: 04 Apr 2025

https://github.com/longnguyen010203/longnguyen010203.github.io

AWS Data Engineer - Workshop

Last synced: 13 Apr 2025

https://github.com/longnguyen010203/aws-fcj-bootcamp-2024

☁️🌈πŸ”₯ Welcome to my AWS Cloud Training repository! This repo contains notes, exercises, and projects from my AWS Cloud training journey, showcasing my progress and understanding of AWS services. πŸ’¨

aws bootcamp cloud markdown terraform workshop

Last synced: 16 Mar 2025