{"id":25449250,"url":"https://github.com/nel-zi/climainsights","last_synced_at":"2026-05-01T19:32:12.137Z","repository":{"id":277989006,"uuid":"934146487","full_name":"Nel-zi/ClimaInsights","owner":"Nel-zi","description":"Developed an automated ETL pipeline using Apache Airflow and Python to collect, process, and store weather data from multiple cities via Weatherstack API. Implemented data cleaning, orchestration, and error handling to ensure accuracy and scalability.","archived":false,"fork":false,"pushed_at":"2025-02-17T11:13:25.000Z","size":117,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-16T08:43:56.888Z","etag":null,"topics":["airflow","apache-spark","data","data-engineering","engineering","etl-pipeline"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Nel-zi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-17T10:56:13.000Z","updated_at":"2025-02-17T11:15:42.000Z","dependencies_parsed_at":"2025-02-17T12:24:47.867Z","dependency_job_id":"c2b74d3b-bde4-4e4d-a779-ccb32dd8295e","html_url":"https://github.com/Nel-zi/ClimaInsights","commit_stats":null,"previous_names":["nel-zi/climainsights"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Nel-zi/ClimaInsights","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nel-zi%2FClimaInsights","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nel-zi%2FClimaInsights/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nel-zi%2FClimaInsights/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nel-zi%2FClimaInsights/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Nel-zi","download_url":"https://codeload.github.com/Nel-zi/ClimaInsights/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nel-zi%2FClimaInsights/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32510673,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-30T13:12:12.517Z","status":"online","status_checked_at":"2026-05-01T02:00:05.856Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","apache-spark","data","data-engineering","engineering","etl-pipeline"],"created_at":"2025-02-17T20:27:21.927Z","updated_at":"2026-05-01T19:32:12.079Z","avatar_url":"https://github.com/Nel-zi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **ClimaInsights**  \n\n### **Automated Weather Data Collection \u0026 Processing with Apache Airflow**  \n\n## **Overview**  \n**ClimaInsights** is an automated ETL (Extract, Transform, Load) pipeline designed to collect, process, and store daily weather data from multiple cities. Using **Apache Airflow**, **Python**, and the **Weatherstack API**, this project ensures efficient, scalable, and error-free weather data retrieval for various applications.  \n\n## **Problem Statement**  \nAccurate and timely weather data is essential for industries like agriculture, logistics, and disaster management. However, current methods for collecting and processing weather data are:  \n- **Manual \u0026 Error-Prone**: Many organizations still rely on manual data collection, leading to inconsistencies and inaccuracies.  \n- **Lacking Scalability**: As data requirements grow, existing systems struggle to handle large-scale, multi-city weather data efficiently.  \n- **Inefficient Decision-Making**: Delays and errors in weather data processing affect strategic planning and real-time applications.  \n\n### **Why Data Engineering?**  \nA well-structured **ETL pipeline** can solve these challenges by automating data collection, ensuring data integrity, and providing a scalable infrastructure for weather analytics. **ClimaInsights** leverages data engineering best practices to streamline the entire process.  \n\n## **Project Objectives**  \n- **Automate Data Collection**: Fetch daily weather data from multiple cities using the **Weatherstack API**.  \n- **Data Processing \u0026 Cleaning**: Use Python scripts to clean and standardize raw weather data.  \n- **Efficient Data Storage**: Store processed weather data in a structured database for easy access and analysis.  \n- **Data Orchestration**: Manage and schedule ETL workflows using **Apache Airflow**.  \n- **Error Handling \u0026 Logging**: Implement robust logging and error-handling mechanisms for reliability.  \n\n## **Benefits of ClimaInsights**  \n- **Automation \u0026 Efficiency** – Eliminates manual work, reducing errors and improving data accuracy.  \n- **Scalability** – Handles growing data volumes and additional cities seamlessly.  \n- **Optimized Resource Utilization** – Frees up data engineers to focus on strategic insights rather than repetitive tasks.  \n\n## **Tech Stack**  \n- **Python** – Data extraction, transformation, and processing  \n- **Apache Airflow** – Workflow orchestration  \n- **Weatherstack API** – Real-time weather data retrieval  \n- **PostgreSQL / MySQL** – Structured data storage  \n\n## **Getting Started**  \n### **Prerequisites**  \n- Python 3.x  \n- Apache Airflow  \n- Weatherstack API Key  \n- Database (PostgreSQL or MySQL)  \n\n### **Installation**  \n1. Clone this repository:  \n   ```bash\n   git clone https://github.com/your-username/ClimaInsights.git\n   cd ClimaInsights\n   ```  \n2. Install dependencies:  \n   ```bash\n   pip install -r requirements.txt\n   ```  \n3. Set up **Weatherstack API Key** in the environment variables.  \n4. Configure and start **Apache Airflow** for ETL orchestration.  \n\n## **Contributing**  \nContributions are welcome. Feel free to submit a pull request or open an issue for suggestions and improvements.  ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnel-zi%2Fclimainsights","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnel-zi%2Fclimainsights","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnel-zi%2Fclimainsights/lists"}