{"id":22210403,"url":"https://github.com/ynstf/weathertrendspipeline","last_synced_at":"2025-03-25T04:55:52.384Z","repository":{"id":265922465,"uuid":"896869653","full_name":"ynstf/WeatherTrendsPipeline","owner":"ynstf","description":null,"archived":false,"fork":false,"pushed_at":"2024-12-16T10:06:55.000Z","size":435,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-30T05:15:13.107Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ynstf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-01T14:04:23.000Z","updated_at":"2024-12-16T10:06:59.000Z","dependencies_parsed_at":"2024-12-16T11:20:29.456Z","dependency_job_id":"1f7bde3c-f5c9-47e0-b2f7-74fec65c3d3f","html_url":"https://github.com/ynstf/WeatherTrendsPipeline","commit_stats":null,"previous_names":["ynstf/weathertrendspipeline"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ynstf%2FWeatherTrendsPipeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ynstf%2FWeatherTrendsPipeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ynstf%2FWeatherTrendsPipeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ynstf%2FWeatherTrendsPipeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ynstf","download_url":"https://codeload.github.com/ynstf/WeatherTrendsPipeline/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245401397,"owners_count":20609167,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-02T20:12:18.397Z","updated_at":"2025-03-25T04:55:52.365Z","avatar_url":"https://github.com/ynstf.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# WeatherTrendsPipeline\n\nA comprehensive data engineering pipeline for collecting, processing, and analyzing weather data using the OpenWeatherMap API.\n\n## Project Structure\n\n```\nWeatherTrendsPipeline/\n├── src/                      # Source code\n│   ├── data/                 # Data handling modules\n│   │   ├── extraction/       # Data extraction scripts\n│   │   │   └── extract_weather.py    # Weather data extraction\n│   │   ├── processing/       # Data transformation scripts\n│   │   │   └── transform_weather.py  # Weather data transformation\n│   │   └── storage/          # Data storage scripts\n│   │       └── load_weather.py       # S3 upload functionality\n│   ├── utils/               # Utility functions\n│   │   ├── __init__.py\n│   │   ├── api_client.py    # API interaction utilities\n│   │   ├── def_transformations.py  # Data transformation functions\n│   │   └── save_data.py    # Data saving and management utilities\n│   └── visualization/       # Data visualization components\n├── data/                    # Data storage\n│   ├── raw/                # Raw weather data\n│   ├── processed/          # Transformed data\n│   └── final/             # Final analysis results\n├── config/                 # Configuration files\n│   ├── db_config.yaml     # Database settings\n│   └── api_config.yaml    # API settings\n├── dags/                  # Automation scripts\n│   ├── extraction.sh      # Data extraction shell script\n│   └── trans_Load.sh      # Transformation and S3 upload script\n├── scheduler/             # Scheduling configuration\n│   └── CRON              # Cron job definitions\n├── docs/                 # Documentation\n│   ├── api_documentation.md    # API endpoints and usage\n│   ├── data_schema.md         # Data structure definitions\n│   └── pipeline_documentation.md  # Pipeline processes and monitoring\n├── logs/                 # Log files\n│   └── extraction.log  # Log file for extractions and uploads\n├── notebooks/           # Jupyter notebooks\n├── tests/              # Test files\n├── .env               # Environment variables (including AWS credentials)\n├── .env.template      # Environment variables template\n├── requirements.txt  # Python dependencies\n└── setup_venv.bat   # Virtual environment setup\n```\n\n## Setup\n\n1. Clone the repository\n2. Create a virtual environment and activate it:\n   ```bash\n   python -m venv venv\n   source venv/bin/activate  # On Windows: venv\\Scripts\\activate\n   ```\n3. Install dependencies:\n   ```bash\n   pip install -r requirements.txt\n   ```\n4. Copy `.env.template` to `.env` and add your OpenWeatherMap API key:\n   ```\n   OPENWEATHERMAP_API_KEY=your_api_key_here\n   ```\n5. Copy `config/api_config.template.yaml` to `config/api_config.yaml`\n\n## Usage Examples\n\n### Getting Current Weather Data\n\n```python\nfrom src.utils.api_client import OpenWeatherMapAPIClient\n\n# Initialize the client\nclient = OpenWeatherMapAPIClient()\n\n# Get weather for Casablanca\ncasablanca_weather = client.get_latest_weather(33.5731, -7.5898)\n\n# Example response:\n{\n    \"message\": \"success\",\n    \"data\": {\n        \"temperature\": 77,        # Temperature in Fahrenheit\n        \"humidity\": 31,           # Humidity percentage\n        \"pressure\": 1020,         # Pressure in mb\n        \"windSpeed\": 3.3,         # Wind speed in mph\n        \"windBearing\": 356,       # Wind direction in degrees\n        \"cloudCover\": 0.69,       # Cloud cover (0-1)\n        \"uvIndex\": 1,            # UV index\n        \"summary\": \"Partly cloudy skies. Temperatures will feel warm...\"\n    }\n}\n\n# Get weather for other Moroccan cities\nrabat_weather = client.get_latest_weather(34.0209, -6.8416)\nmarrakech_weather = client.get_latest_weather(31.6295, -7.9811)\nfez_weather = client.get_latest_weather(34.0181, -5.0078)\n```\n\n### Common Locations\n\n| City       | Latitude  | Longitude |\n|------------|-----------|-----------|\n| Casablanca | 33.5731   | -7.5898   |\n| Rabat      | 34.0209   | -6.8416   |\n| Marrakech  | 31.6295   | -7.9811   |\n| Fez        | 34.0181   | -5.0078   |\n\n## Features\n\n- Real-time weather data retrieval\n- Secure API key management using environment variables\n- Error handling and automatic retries\n- Support for multiple Moroccan cities\n- Comprehensive weather information including:\n  - Temperature\n  - Humidity\n  - Wind speed and direction\n  - Cloud cover\n  - UV index\n  - Atmospheric pressure\n  - Weather conditions summary\n\n## Data Processing Features\n\n### Transformation Pipeline\n- Comprehensive data transformation capabilities:\n  - Temperature unit conversions (°F to °C)\n  - Wind speed conversions (mph to km/h and m/s)\n  - Pressure conversions (mb to hPa and atm)\n  - Visibility distance conversions (miles to km)\n\n### Advanced Weather Metrics\n- Heat Index calculation for temperatures ≥ 80°F\n- Wind Chill calculation for temperatures ≤ 50°F and wind speeds \u003e 3 mph\n- Seasonal categorization based on month\n- Time-based analytics (hour, day, month patterns)\n\n### Automated Scheduling\n- Cron job implementation for automated data collection\n- Shell script automation:\n  ```bash\n  # Extraction shell script (dags/extraction.sh)\n  #!/bin/bash\n  cd /WeatherTrendsPipeline\n  source venv/bin/activate\n  python src/data/extraction/extract_weather.py \u003e\u003e logs/extraction.log 2\u003e\u00261\n  ```\n- Cron scheduling example:\n  ```bash\n  # Run every 14 minutes (optimized for 100 API calls per day limit)\n  */14 * * * * /WeatherTrendsPipeline/dags/extraction.sh\n  ```\n- Logging system:\n  ```\n  logs/\n  └── extraction.log  # Log file for extractions and uploads\n  ```\n  Log entries include:\n  ```\n  [2024-12-06 14:00:01] INFO: Starting weather data extraction\n  [2024-12-06 14:00:02] INFO: Successfully retrieved weather data for Casablanca\n  [2024-12-06 14:00:03] INFO: Data saved to /WeatherTrendsPipeline/data/raw/weather_data.csv\n  [2024-12-06 14:14:01] INFO: Starting weather data extraction\n  [2024-12-06 14:14:02] INFO: Successfully retrieved weather data for Casablanca\n  [2024-12-06 14:14:03] INFO: Data saved to /WeatherTrendsPipeline/data/raw/weather_data.csv\n  ...\n  ```\n\n### Data Storage Structure\n- Raw data storage:\n  - CSV format with timestamp-based naming\n  - Automatic daily file management\n- Processed data:\n  - Enriched datasets with calculated metrics\n  - Transformed units for analysis\n  - Temporal features extraction\n- Cloud Storage:\n  - Automated daily uploads to AWS S3 bucket at 12:01\n  - Processed data backup and archival\n  - Automatic cleanup of local files after successful upload\n\n### Automated Scheduling\n- Daily S3 Upload Schedule:\n  ```bash\n  # Upload processed data to S3 bucket daily at 12:01\n  1 12 * * * /WeatherTrendsPipeline/dags/upload_to_s3.sh\n  ```\n\n### Remote Deployment\n- AWS EC2 instance deployment\n- SSH Configuration:\n  ```\n  Host WeatherTrendsPipeline\n      Hostname X.X.X.X\n      User ubuntu\n      IdentityFile \"path/to/WeatherPipeline.pem\"\n  ```\n\n### Error Handling and Logging\n- Comprehensive error handling for:\n  - API connection issues\n  - Data validation\n  - File operations\n- Detailed logging system\n- Automatic retry mechanisms\n\n## Technical Requirements\n\n### Python Environment\n- Python version: 3.9.7\n- Key dependencies:\n  ```\n  numpy==1.24.3\n  pandas==2.0.3\n  requests\n  python-dotenv\n  loguru\n  ```\n\n### System Requirements\n- Linux/Unix environment for cron jobs\n- Write permissions for data directories\n- Network access for API calls\n- SSH access for remote deployment\n\n## Contributing\n\n1. Fork the repository\n2. Create your feature branch\n3. Commit your changes\n4. Push to the branch\n5. Create a new Pull Request\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fynstf%2Fweathertrendspipeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fynstf%2Fweathertrendspipeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fynstf%2Fweathertrendspipeline/lists"}