Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists by Undisputed-jay

A curated list of projects in awesome lists by Undisputed-jay .

https://github.com/undisputed-jay/sql-island

SQL Island is a fun introduction to learning and using SQL.

sql

Last synced: 13 Feb 2025

https://github.com/undisputed-jay/spotifyapi-data-engineering-project

This projects uses ETL (Extract, Transform and Load) pipeline to extract data from Spotify using its API and loads the data to a data source(AWS Athena). The entire pipeline will be built using Amazon Web Services (AWS).

aws aws-athena aws-cloudformation aws-lambda aws-s3 awsglue python3 sql

Last synced: 13 Feb 2025

https://github.com/undisputed-jay/aws-s3-integration-with-snowflake

This project sets up an ETL pipeline to load Citibike trip data from an AWS S3 bucket into Snowflake. It establishes a secure integration with S3, defines a CSV file format, stages the data, and loads it into a Snowflake table for analysis.

aws-s3 snowflake sql

Last synced: 13 Feb 2025

https://github.com/undisputed-jay/etl-on-gcp-with-apache-airflow

In this project, files were ingested to Google Cloud Storage and later to moved to BigQuery so as to perform some queries and the result moved back to Google Cloud Storage.

apache-airflow bigquery data-engineering data-warehouse docker etl-pipeline google-cloud-platform

Last synced: 13 Feb 2025

https://github.com/undisputed-jay/wikipedia_stadium_data_pipeline_with_apache_airflow

An Apache Airflow pipeline that scrapes football stadium data from Wikipedia, processes it with pandas, stores it in PostgreSQL, and saves query results to CSV.

Last synced: 13 Feb 2025

https://github.com/undisputed-jay/behavior-driven-development-testing-for-ecommerce-login

This project automates login testing with Behave and Selenium WebDriver, using BDD to verify login scenarios like valid and invalid credentials. The page object model (POM) keeps the code organized and easy to scale.

bdd-login-testing behave-framework page-object-model python-test-automation reusable-test-code selenium-automation

Last synced: 13 Feb 2025

https://github.com/undisputed-jay/covid-19_dataset

This repo contains data cleaning of Covid-19 Dataset

Last synced: 13 Feb 2025

https://github.com/undisputed-jay/sql-questions-and-answers-using-mssql

This repository contains a collection of SQL questions and answers to help with learning and practicing SQL concepts. The content is regularly updated with new queries, solutions, and explanations to provide a comprehensive resource for SQL enthusiasts and learners.

Last synced: 13 Feb 2025

https://github.com/undisputed-jay/airflow-etl-pipeline-with-pyspark-and-google-cloud-dataproc

This project automates daily vehicle data processing on Google Cloud using Apache Airflow. It uploads scripts to Google Cloud Storage, runs specific PySpark jobs on Dataproc based on the day, and shuts down resources when done for efficiency.

automated-etl-airflow-dataproc cost-effective-data-processing daily-data-analysis-airflow-pyspark

Last synced: 13 Feb 2025

https://github.com/undisputed-jay/building-an-efficient-etl-pipeline-for-property-records-in-real-estate

An ETL pipeline that ingests, transforms, and loads real estate property data into a PostgreSQL database. The project includes data cleaning, schema creation, query execution for insights, and automation via Windows Task Scheduler.

automated-data-workflows etl-pipeline-design

Last synced: 13 Feb 2025

https://github.com/undisputed-jay/streaming-data-from-reddit-using-kafka-spark-and-mongodb

A data pipeline that streams Reddit comments from the 'Politics' subreddit using Kafka and Apache Spark. Processed data is stored in MongoDB for real-time analysis and management.

apache-spark big-data data-engineering etl-pipeline kafka mongodb mongodb-atlas pyspark real-time-streaming redditapi streaming-analytics

Last synced: 06 Feb 2025