Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists by Undisputed-jay
A curated list of projects in awesome lists by Undisputed-jay .
https://github.com/undisputed-jay/sql-island
SQL Island is a fun introduction to learning and using SQL.
Last synced: 13 Feb 2025
https://github.com/undisputed-jay/webscraping-with-beautifulsoup
beautifulsoup4 jupyter-notebook python requests
Last synced: 13 Feb 2025
https://github.com/undisputed-jay/creating-a-master-property-listing-for-london
api beautifulsoup4 python3 requests
Last synced: 13 Feb 2025
https://github.com/undisputed-jay/spotifyapi-data-engineering-project
This projects uses ETL (Extract, Transform and Load) pipeline to extract data from Spotify using its API and loads the data to a data source(AWS Athena). The entire pipeline will be built using Amazon Web Services (AWS).
aws aws-athena aws-cloudformation aws-lambda aws-s3 awsglue python3 sql
Last synced: 13 Feb 2025
https://github.com/undisputed-jay/aws-s3-integration-with-snowflake
This project sets up an ETL pipeline to load Citibike trip data from an AWS S3 bucket into Snowflake. It establishes a secure integration with S3, defines a CSV file format, stages the data, and loads it into a Snowflake table for analysis.
Last synced: 13 Feb 2025
https://github.com/undisputed-jay/etl-on-gcp-with-apache-airflow
In this project, files were ingested to Google Cloud Storage and later to moved to BigQuery so as to perform some queries and the result moved back to Google Cloud Storage.
apache-airflow bigquery data-engineering data-warehouse docker etl-pipeline google-cloud-platform
Last synced: 13 Feb 2025
https://github.com/undisputed-jay/ecommerce-browser-automation-with-selenium-and-python
Last synced: 13 Feb 2025
https://github.com/undisputed-jay/wikipedia_stadium_data_pipeline_with_apache_airflow
An Apache Airflow pipeline that scrapes football stadium data from Wikipedia, processes it with pandas, stores it in PostgreSQL, and saves query results to CSV.
Last synced: 13 Feb 2025
https://github.com/undisputed-jay/behavior-driven-development-testing-for-ecommerce-login
This project automates login testing with Behave and Selenium WebDriver, using BDD to verify login scenarios like valid and invalid credentials. The page object model (POM) keeps the code organized and easy to scale.
bdd-login-testing behave-framework page-object-model python-test-automation reusable-test-code selenium-automation
Last synced: 13 Feb 2025
https://github.com/undisputed-jay/seleniumhybridpageobjectmodelpythonframework
Last synced: 13 Feb 2025
https://github.com/undisputed-jay/sentiment-analysis-and-text-mining
nlp-machine-learning pandas-python python
Last synced: 13 Feb 2025
https://github.com/undisputed-jay/covid-19_dataset
This repo contains data cleaning of Covid-19 Dataset
Last synced: 13 Feb 2025
https://github.com/undisputed-jay/weather-data-etl-pipeline-using-apache-airflow
Last synced: 13 Feb 2025
https://github.com/undisputed-jay/sql-questions-and-answers-using-mssql
This repository contains a collection of SQL questions and answers to help with learning and practicing SQL concepts. The content is regularly updated with new queries, solutions, and explanations to provide a comprehensive resource for SQL enthusiasts and learners.
Last synced: 13 Feb 2025
https://github.com/undisputed-jay/airflow-etl-pipeline-with-pyspark-and-google-cloud-dataproc
This project automates daily vehicle data processing on Google Cloud using Apache Airflow. It uploads scripts to Google Cloud Storage, runs specific PySpark jobs on Dataproc based on the day, and shuts down resources when done for efficiency.
automated-etl-airflow-dataproc cost-effective-data-processing daily-data-analysis-airflow-pyspark
Last synced: 13 Feb 2025
https://github.com/undisputed-jay/aws-data-pipeline-csv-to-parquet-with-glue-and-athena
Last synced: 13 Feb 2025
https://github.com/undisputed-jay/building-an-efficient-etl-pipeline-for-property-records-in-real-estate
An ETL pipeline that ingests, transforms, and loads real estate property data into a PostgreSQL database. The project includes data cleaning, schema creation, query execution for insights, and automation via Windows Task Scheduler.
automated-data-workflows etl-pipeline-design
Last synced: 13 Feb 2025
https://github.com/undisputed-jay/streaming-data-from-reddit-using-kafka-spark-and-mongodb
A data pipeline that streams Reddit comments from the 'Politics' subreddit using Kafka and Apache Spark. Processed data is stored in MongoDB for real-time analysis and management.
apache-spark big-data data-engineering etl-pipeline kafka mongodb mongodb-atlas pyspark real-time-streaming redditapi streaming-analytics
Last synced: 06 Feb 2025