Projects in Awesome Lists by longNguyen010203
A curated list of projects in awesome lists by longNguyen010203 .
https://github.com/longnguyen010203/youtube-recommend-master-etl-pipeline
πππ A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api πΊ
cleaning-data dagster data-engineering data-engineering-pipeline dbt docker docker-compose dockerfile etl-pipeline metabase minio mysql polars postgresql processing pyspark spark streamlit youtube youtube-api
Last synced: 22 Nov 2024
https://github.com/longnguyen010203/fde-course-2024-w4-dbt
π»πFundamental Data Engineering Course 2024 Week4 Learn DBT Transform Data with Models, Macro, ELT-Pipeline with Dagster π
dagster dbt docker docker-compose dockerfile elt-pipeline mysql pandas postgresql
Last synced: 19 Apr 2025
https://github.com/longnguyen010203/ecommerce-elt-pipeline
πππ A Data Engineering Project π that implements an ELT data pipeline using Dagster, Docker, Dbt, Polars, Snowflake, PostgreSQL. Data from kaggle website π₯
dagster data data-engineering dbt docker docker-compose dockerfile elt elt-pipeline extract kaggle load polars postgresql raw-data relational-databases snowflake transform
Last synced: 23 Jan 2025
https://github.com/longnguyen010203/inspireai-web-2024
π€ππΊ This project involves creating an AI chatbot with OpenAI using ChatGPT, DALL-E, Codex, and Django to develop the web application π
ai chatbot chatgpt35-turbo codex css dall-e django docker html javascript openai openai-api postgresql python
Last synced: 19 Apr 2025
https://github.com/longnguyen010203/zillow-home-value-prediction
πππ The Zillow Home Value Prediction project employs linear regression models on Kaggle datasets to forecast house prices. ππ°Using Apache Spark (PySpark) within a Docker setup enables efficient data preprocessing, exploration, analysis, visualization, and model building with distributed computing for parallel computation.
analysis apache-spark distributed-computing docker docker-compose feature-engineering jupyter-notebook jupyterlab linear-regression machine-learning models parallel-computing prediction-model preprocessing pyspark visualization
Last synced: 27 Mar 2025
https://github.com/longnguyen010203/100day-self-learning-de
ππ»β¨ Self-study process for more than 3 months with 3-4h/day to prepare for the journey of applying for an intern or fresher position as a Data Engineer in 2024 οΈπ₯οΈπ
data-engineer data-engineering self-learning
Last synced: 22 Nov 2024
https://github.com/longnguyen010203/spark-kafka-self-learning
πππ A third-year student is self-studying Spark and Kafka as part of their π· data engineering journey, with the goal of securing an π¬ internship or fresher job in 2024.
apache-kafka apache-spark cluster docker docker-compose zookeeper
Last synced: 14 Feb 2025
https://github.com/longnguyen010203/bank-datawarehouse
πππ This project develop a data warehouse for a bank using Amazon Redshift, VPC, Glue, S3 and DBT, following a β Star Schema architecture. The goal is to storage, manage, and optimize data to support decision making and reporting π΅οΈ
amazon amazon-web-services aws banking bronze dbeaver dbt dimensions facts glue gold redshift s3 security-group silver sql starschema vpc vpc-subnet warehouse
Last synced: 16 Mar 2025
https://github.com/longnguyen010203/longnguyen--aws-trainning--2024
βοΈππ₯ Welcome to my AWS Cloud Training repository! This repo contains notes, exercises, and projects from my AWS Cloud training journey, showcasing my progress and understanding of AWS services. π¨
aws bootcamp cloud markdown terraform workshop
Last synced: 22 Nov 2024
https://github.com/longnguyen010203/finance-data-ingestion-pipeline-with-kafka
Last synced: 22 Nov 2024
https://github.com/longnguyen010203/spark-processing-aws
π·π Set up and build a big data processing pipeline with Apache Spark, π¦ AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflowsπ₯
apache-airflow apache-spark aws aws-ec2 aws-s3 aws-services cloud-computing data-pipeline emr-cluster iam pyspark redshift spark-cluster spark-master spark-worker terraform
Last synced: 11 Mar 2025
https://github.com/longnguyen010203/data-warehouse-accident-us-2016-2023
Design and implement a data warehouse to manage automobile accident cases across all 49 states in the US, using a star schema and Snowflake for the data warehouse architecture.
apache-airflow apache-spark data-ingestion data-processing data-quality-checks data-transformation data-warehouse dbt decorators-python dimensions docker docker-compose dockerfile fastapi minio powerbi pyspark snowflake star-schema
Last synced: 04 Apr 2025
https://github.com/longnguyen010203/longnguyen010203.github.io
AWS Data Engineer - Workshop
Last synced: 13 Apr 2025
https://github.com/longnguyen010203/aws-fcj-bootcamp-2024
βοΈππ₯ Welcome to my AWS Cloud Training repository! This repo contains notes, exercises, and projects from my AWS Cloud training journey, showcasing my progress and understanding of AWS services. π¨
aws bootcamp cloud markdown terraform workshop
Last synced: 16 Mar 2025