Projects in Awesome Lists by kgelli
A curated list of projects in awesome lists by kgelli .
https://github.com/kgelli/pyspark-essentials
A collection of essential PySpark code examples demonstrating SQL operations, RDD transformations and core functionalities for data processing.
actions big-data python rdd spark sql-operations transformations
Last synced: 28 Mar 2025
https://github.com/kgelli/news-sentiment-analysis-pipeline-with-microsoft-fabric
End-to-end news sentiment analysis pipeline built with Microsoft Fabric, analyzing Bing News API data with sentiment analysis, visualization in Power BI, and real-time alerts via Teams
azure bing-api data-activator data-engineering data-pipeline data-visualization fabric microsoft-fabric one-lake-synapse power-bi sentiment-analysis
Last synced: 14 Mar 2025
https://github.com/kgelli/airflow-weather-aws-pipeline
Automated ETL pipeline that collects weather data from OpenWeather API and integrates it with city demographic information using Apache Airflow, AWS RDS PostgreSQL and S3.
airflow aws-rds dag etl postgresql python s3 weather-api
Last synced: 15 Mar 2025
https://github.com/kgelli/apple-data-analysis---apache-spark
Modular ETL pipeline for analyzing Apple product purchase patterns using Apache Spark on Databricks with factory design patterns.
apache-spark data-analysis databricks delta-lake etl-pipeline factory-pattern pyspark
Last synced: 08 Mar 2025
https://github.com/kgelli/sales-data-analytics---azure-end-to-end-data-engineering
Last synced: 24 Mar 2025
https://github.com/kgelli/web-scraper-python
Web scraper that extracts quotes from quotes.toscrape.com and stores them in a MySQL database using Docker containers.
beautifulsoup container-orchestration data-extraction docker mysql python web-scraping
Last synced: 24 Mar 2025
https://github.com/kgelli/file-management-system
Simple Python command-line utility for basic file system operations using different implementation approaches.
command-line file-management os-operations python utility
Last synced: 24 Mar 2025
https://github.com/kgelli/orchestrated-data-streaming-e2e-pipeline-with-airflow-kafka-spark
A comprehensive end-to-end real-time data streaming pipeline using Apache Airflow, Kafka, Spark, and Cassandra, containerized with Docker.
airflow big-data cassandra data-engineering docker event-streaming kafka microservices realtime-streaming spark
Last synced: 24 Mar 2025
https://github.com/kgelli/nlp-qa-chatbot
A neural network chatbot that answers questions about stories using memory networks trained on the Facebook Babi dataset.
chatbot deep-learning keras memory-networks neural-networks nlp
Last synced: 24 Mar 2025
https://github.com/kgelli/intelligent-diagnostic-system-based-on-deep-learning-and-iot
Last synced: 24 Mar 2025
https://github.com/kgelli/docker_automation_bash_scripting
Docker Management System - A comprehensive Bash script providing a user-friendly interface for managing Docker containers, images, volumes, and networks via a terminal-based menu system.
automation containerization developer-tools devops docker docker-compose docker-management infrastructure-as-code shell-script sysadmin
Last synced: 24 Mar 2025
https://github.com/kgelli/nlp-tranformer-based-architectures-for-medical-reviews
Last synced: 24 Mar 2025
https://github.com/kgelli/pyspark-fundamentals
A comprehensive collection of PySpark fundamentals with practical examples using retail and Formula 1 datasets.
big-data data-transformation dataframes pyspark python spark-sql
Last synced: 05 Apr 2025