An open API service indexing awesome lists of open source software.

Projects in Awesome Lists by kgelli

A curated list of projects in awesome lists by kgelli .

https://github.com/kgelli/pyspark-essentials

A collection of essential PySpark code examples demonstrating SQL operations, RDD transformations and core functionalities for data processing.

actions big-data python rdd spark sql-operations transformations

Last synced: 28 Mar 2025

https://github.com/kgelli/news-sentiment-analysis-pipeline-with-microsoft-fabric

End-to-end news sentiment analysis pipeline built with Microsoft Fabric, analyzing Bing News API data with sentiment analysis, visualization in Power BI, and real-time alerts via Teams

azure bing-api data-activator data-engineering data-pipeline data-visualization fabric microsoft-fabric one-lake-synapse power-bi sentiment-analysis

Last synced: 14 Mar 2025

https://github.com/kgelli/airflow-weather-aws-pipeline

Automated ETL pipeline that collects weather data from OpenWeather API and integrates it with city demographic information using Apache Airflow, AWS RDS PostgreSQL and S3.

airflow aws-rds dag etl postgresql python s3 weather-api

Last synced: 15 Mar 2025

https://github.com/kgelli/apple-data-analysis---apache-spark

Modular ETL pipeline for analyzing Apple product purchase patterns using Apache Spark on Databricks with factory design patterns.

apache-spark data-analysis databricks delta-lake etl-pipeline factory-pattern pyspark

Last synced: 08 Mar 2025

https://github.com/kgelli/web-scraper-python

Web scraper that extracts quotes from quotes.toscrape.com and stores them in a MySQL database using Docker containers.

beautifulsoup container-orchestration data-extraction docker mysql python web-scraping

Last synced: 24 Mar 2025

https://github.com/kgelli/file-management-system

Simple Python command-line utility for basic file system operations using different implementation approaches.

command-line file-management os-operations python utility

Last synced: 24 Mar 2025

https://github.com/kgelli/healthnest

Last synced: 24 Mar 2025

https://github.com/kgelli/orchestrated-data-streaming-e2e-pipeline-with-airflow-kafka-spark

A comprehensive end-to-end real-time data streaming pipeline using Apache Airflow, Kafka, Spark, and Cassandra, containerized with Docker.

airflow big-data cassandra data-engineering docker event-streaming kafka microservices realtime-streaming spark

Last synced: 24 Mar 2025

https://github.com/kgelli/kgelli

Last synced: 24 Mar 2025

https://github.com/kgelli/nlp-qa-chatbot

A neural network chatbot that answers questions about stories using memory networks trained on the Facebook Babi dataset.

chatbot deep-learning keras memory-networks neural-networks nlp

Last synced: 24 Mar 2025

https://github.com/kgelli/nlp

Last synced: 24 Mar 2025

https://github.com/kgelli/docker_automation_bash_scripting

Docker Management System - A comprehensive Bash script providing a user-friendly interface for managing Docker containers, images, volumes, and networks via a terminal-based menu system.

automation containerization developer-tools devops docker docker-compose docker-management infrastructure-as-code shell-script sysadmin

Last synced: 24 Mar 2025

https://github.com/kgelli/pyspark-fundamentals

A comprehensive collection of PySpark fundamentals with practical examples using retail and Formula 1 datasets.

big-data data-transformation dataframes pyspark python spark-sql

Last synced: 05 Apr 2025