Projects in Awesome Lists by Undisputed-jay

https://github.com/undisputed-jay/sql-island

SQL Island is a fun introduction to learning and using SQL.

sql

Last synced: 12 Aug 2025

https://github.com/undisputed-jay/undisputed-jay

Last synced: 24 Jan 2026

https://github.com/undisputed-jay/end-to-end-sentiment-analysis-with-webscraping

nlp-machine-learning python text-mining

Last synced: 19 May 2026

https://github.com/undisputed-jay/automatinghybridlogin

Last synced: 06 Apr 2025

https://github.com/undisputed-jay/creating-a-master-property-listing-for-london

api beautifulsoup4 python3 requests

Last synced: 06 Apr 2025

https://github.com/undisputed-jay/restapi-testing-with-python

Last synced: 06 Apr 2025

https://github.com/undisputed-jay/etl-on-gcp-with-apache-airflow

In this project, files were ingested to Google Cloud Storage and later to moved to BigQuery so as to perform some queries and the result moved back to Google Cloud Storage.

apache-airflow bigquery data-engineering data-warehouse docker etl-pipeline google-cloud-platform

Last synced: 06 May 2026

https://github.com/undisputed-jay/sentiment-analysis-and-text-mining

nlp-machine-learning pandas-python python

Last synced: 18 May 2026

https://github.com/undisputed-jay/ecommerce-browser-automation-with-selenium-and-python

Last synced: 04 Aug 2025

https://github.com/undisputed-jay/wikipedia_stadium_data_pipeline_with_apache_airflow

An Apache Airflow pipeline that scrapes football stadium data from Wikipedia, processes it with pandas, stores it in PostgreSQL, and saves query results to CSV.

Last synced: 13 Oct 2025

https://github.com/undisputed-jay/behavior-driven-development-testing-for-ecommerce-login

This project automates login testing with Behave and Selenium WebDriver, using BDD to verify login scenarios like valid and invalid credentials. The page object model (POM) keeps the code organized and easy to scale.

bdd-login-testing behave-framework page-object-model python-test-automation reusable-test-code selenium-automation

Last synced: 06 Apr 2025

https://github.com/undisputed-jay/badges

Last synced: 08 Oct 2025

https://github.com/undisputed-jay/seleniumhybridpageobjectmodelpythonframework

Last synced: 11 Oct 2025

https://github.com/undisputed-jay/aws-s3-integration-with-snowflake

This project sets up an ETL pipeline to load Citibike trip data from an AWS S3 bucket into Snowflake. It establishes a secure integration with S3, defines a CSV file format, stages the data, and loads it into a Snowflake table for analysis.

aws-s3 snowflake sql

Last synced: 20 Mar 2026

https://github.com/undisputed-jay/webscraping-with-beautifulsoup

beautifulsoup4 jupyter-notebook python requests

Last synced: 17 Apr 2026

https://github.com/undisputed-jay/spotifyapi-data-engineering-project

This projects uses ETL (Extract, Transform and Load) pipeline to extract data from Spotify using its API and loads the data to a data source(AWS Athena). The entire pipeline will be built using Amazon Web Services (AWS).

aws aws-athena aws-cloudformation aws-lambda aws-s3 awsglue python3 sql

Last synced: 05 May 2026

https://github.com/undisputed-jay/power_bi_reports

powerbi

Last synced: 16 Jun 2025

https://github.com/undisputed-jay/building-an-efficient-etl-pipeline-for-property-records-in-real-estate

An ETL pipeline that ingests, transforms, and loads real estate property data into a PostgreSQL database. The project includes data cleaning, schema creation, query execution for insights, and automation via Windows Task Scheduler.

automated-data-workflows etl-pipeline-design

Last synced: 06 Apr 2025

https://github.com/undisputed-jay/covid-19_dataset

This repo contains data cleaning of Covid-19 Dataset

Last synced: 06 Apr 2025

https://github.com/undisputed-jay/airflow-etl-pipeline-with-pyspark-and-google-cloud-dataproc

This project automates daily vehicle data processing on Google Cloud using Apache Airflow. It uploads scripts to Google Cloud Storage, runs specific PySpark jobs on Dataproc based on the day, and shuts down resources when done for efficiency.

automated-etl-airflow-dataproc cost-effective-data-processing daily-data-analysis-airflow-pyspark

Last synced: 20 Jul 2025

https://github.com/undisputed-jay/aws-data-pipeline-csv-to-parquet-with-glue-and-athena

Last synced: 25 Aug 2025

https://github.com/undisputed-jay/snowflake-dbt

Last synced: 19 Jan 2026

https://github.com/undisputed-jay/sql-questions-and-answers-using-mssql

This repository contains a collection of SQL questions and answers to help with learning and practicing SQL concepts. The content is regularly updated with new queries, solutions, and explanations to provide a comprehensive resource for SQL enthusiasts and learners.

Last synced: 25 Feb 2026

https://github.com/undisputed-jay/weather-data-etl-pipeline-using-apache-airflow

Last synced: 06 Apr 2025

https://github.com/undisputed-jay/streaming-data-from-reddit-using-kafka-spark-and-mongodb

A data pipeline that streams Reddit comments from the 'Politics' subreddit using Kafka and Apache Spark. Processed data is stored in MongoDB for real-time analysis and management.

apache-spark big-data data-engineering etl-pipeline kafka mongodb mongodb-atlas pyspark real-time-streaming redditapi streaming-analytics

Last synced: 13 Apr 2026

https://github.com/undisputed-jay/dbt-bigquery

Last synced: 16 Feb 2026

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome