{"id":18399135,"url":"https://github.com/bayoadejare/lightning-containers","last_synced_at":"2025-04-07T05:34:07.458Z","repository":{"id":222199048,"uuid":"756256499","full_name":"BayoAdejare/lightning-containers","owner":"BayoAdejare","description":"Docker powered starter for geospatial analysis of lightning atmospheric data.","archived":false,"fork":false,"pushed_at":"2025-04-01T16:24:43.000Z","size":168191,"stargazers_count":6,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-06T04:27:33.968Z","etag":null,"topics":["clustering-analysis","csv-files","data-engineer","data-engineering-pipeline","data-warehouse","databases","docker","jupyter","machine-learning-algorithms","noaa-weather","orchestrator","pandas","python3","spatialite","sqlite","streamlit-dashboard"],"latest_commit_sha":null,"homepage":"https://lightning-containers.streamlit.app/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BayoAdejare.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-02-12T09:54:26.000Z","updated_at":"2025-04-01T16:24:47.000Z","dependencies_parsed_at":"2024-10-30T22:30:44.259Z","dependency_job_id":null,"html_url":"https://github.com/BayoAdejare/lightning-containers","commit_stats":null,"previous_names":["bayoadejare/lightning-containers"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BayoAdejare%2Flightning-containers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BayoAdejare%2Flightning-containers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BayoAdejare%2Flightning-containers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BayoAdejare%2Flightning-containers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BayoAdejare","download_url":"https://codeload.github.com/BayoAdejare/lightning-containers/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247601378,"owners_count":20964861,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering-analysis","csv-files","data-engineer","data-engineering-pipeline","data-warehouse","databases","docker","jupyter","machine-learning-algorithms","noaa-weather","orchestrator","pandas","python3","spatialite","sqlite","streamlit-dashboard"],"created_at":"2024-11-06T02:25:53.557Z","updated_at":"2025-04-07T05:34:02.443Z","avatar_url":"https://github.com/BayoAdejare.png","language":"Python","funding_links":["https://ko-fi.com/bayoadejare'"],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n \u003ch1\u003e⚡Lightning Containers: docker-powered lightning atmospheric dataset 📈\u003c/h1\u003e\n    \u003cp align=\"center\"\u003e\n        \u003ca href='https://ko-fi.com/bayoadejare' target='_blank'\u003e\u003cimg height='35' style='border:0px;height:46px;' src='https://az743702.vo.msecnd.net/cdn/kofi3.png?v=0' border='0' alt='Buy Me a Coffee at ko-fi.com' /\u003e\u003c/a\u003e\n    \u003c/p\u003e\n\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n  \u003ca target=\"_blank\" href=\"https://lightning-containers.streamlit.app\" style=\"background:none\"\u003e\n    \u003cimg src=\"https://static.streamlit.io/badges/streamlit_badge_black_white.svg?labelColor=FFAC33\u0026color=2596BE\u0026logo=streamlit\" /\u003e\n  \u003c/a\u003e\n\n  \u003ca target=\"_blank\" href=\"https://github.com/BayoAdejare/lightning-containers/actions\" style=\"background:none\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/actions/workflow/status/bayoadejare/lightning-containers/docker-image.yml?labelColor=FFAC33\u0026color=2596BE\u0026logo=actions\" /\u003e\n  \u003c/a\u003e\n\n  \u003ca target=\"_blank\" href=\"https://github.com/BayoAdejare/lightning-containers/blob/main/LICENSE\" style=\"background:none\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/license/BayoAdejare/lightning-containers?labelColor=FFAC33\u0026color=2596BE\u0026logo=license\" /\u003e\n  \u003c/a\u003e\n\n  \u003ca target=\"_blank\" href=\"https://github.com/BayoAdejare/lightning-containers\" style=\"background:none\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/stars/BayoAdejare/lightning-containers?labelColor=FFAC33\u0026color=2596BE\u0026logo=github\"\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n\n## Table of Contents\n\n- [Introduction](#introduction)\n- [Project Structure](#project-structure)\n- [Requirements](#requirements)\n- [Installation](#installation)\n- [ETL Flow](#etl-flow)\n- [Clustering Flow](#clustering-flow)\n- [Dashboard Map](#dashboard-map)\n- [Testing](#testing)\n- [CI/CD](#cicd)\n- [License](#license)\n- [Acknowledgements](#acknowledgements)\n\n## Introduction \n\nThis is a monolith Docker image to help you get started with geospatial analysis and visualization of lightning atmospheric data. The data comes from US **National Oceanic and Atmospheric Administration (NOAA)** [Geostationary Lightning Mapper (GLM) - Data Product](https://www.goes-r.gov/products/baseline-lightning-detection.html) sourced from AWS s3 buckets. There are currently two main component:\n1. ETL Ingestion - data ingestion and analysis processes.\n2. Streamlit dashboard app - frontend gis visualization dashboard.\n\nProcessing done using Pandas dataframes, SQlite with Spatialite extension as the local storage and self-hosted Prefect server instance for orchestration and observability of the processing pipelines.\n\n\n|\u003ca href=\"img/main_tech_stack.png\" align=\"center\"\u003e\u003cimg src=\"img/main_tech_stack.png\" alt=\"Technologies used and respective logos\" width=\"800px\"/\u003e\u003c/a\u003e\n|:--:|\n|Architecture: Docker + Prefect + Pandas + SQLite + Streamlit|\n\n**Brief Data Summary [Lightning Cluster Filter Algorithm (LCFA)](https://www.star.nesdis.noaa.gov/goesr/documents/ATBDs/Baseline/ATBD_GOES-R_GLM_v3.0_Jul2012.pdf)**\n\n```\nThe multidimensional data structures stored in the netCDF4 files contain a rich variety of \ndata including metadata with descriptors. In general, the main variables: flashes, groups, \nevents form an hierarchy, i.e. a series of detected radiant events are clustered into groups and groups \nare clustered into flashes using LCFA.\n```\n## Project Structure \n\n```\nlightning-containers/\n|\n├── src/\n│   ├── flows.py\n│   └── tasks/\n|       └── analytics/\n|       └── etl/\n├── app/\n|   └── dashboard.py\n├── notebooks/\n|   └── clustering/\n|   └── mapping/\n|   └── streaming/\n├── tests/\n│   └── test_clustering.py\n|   └── test_extract.py\n|   └── test_load.py\n|   └── test_transform.py\n├── docs/\n│   └── index.md\n├── img/\n├── .streamlit/\n│   └── config.toml\n│   └── secrets.toml\n├── .github/\n│   └── workflows/\n│       └── docker-image.yml\n├── data/\n├── .gitignore\n├── LICENSE\n├── CONTRIBUTING.md\n├── CODE_OF_CONDUCT.md\n├── Dockerfile\n├── docker-compose.yml\n└── README.md\n```\n\n## Requirements\n\n|Resource|Minimum|Recommended|\n|--------|-------|-----------|\n|CPU     |2 cores|4+ cores   |\n|RAM     |6GB    |16GB       |\n|Storage |8GB    |24GB       |\n\n## Installation\n\n### Quick Start: Docker Container\n\n1. Clone the repository.\n\n```\ngit clone https://github.com/BayoAdejare/lightning-containers.git\ncd lightning-containers\n```\n\n2. Can be ran with docker containers or installed locally.\n\n```\ndocker-compose up -d # spin up containers\n```\n\n### Local install\n\nMake sure you have the virtual environment configured:\n\n```\npython -m venv venv\nsource venv/bin/activate  # On Windows, use `venv\\Scripts\\activate`\n```\n\nFor requirements, this can be installed from the project directory via pip's setup command:\n\n`pip install -r requirements.txt # =\u003c python3.12 `\n\n### Start Flow\n\nRun the command to start the prefect workflow orchestration: \n\n`prefect server start # Start prefect engine and UI i.e. http://localhost:4200/`\n\nThe prefect orchestration platform is required to start the scheduling, from the prefect ui, you can run and monitor the data flows.\n\nRun the command to start the data app. \n\n`python src/flows.py # Start backend`\n\n`streamlit run app/dashboard.py # Start frontend i.e. http://localhost:8501/`\n\n## ETL Flow\n\nETL flow data tasks:\n\n+ `Source`: **extracts** NOAA GOES-R GLM file datasets from AWS s3 bucket, default is GOES-18. \n+ `Transformations`: **transforms** dataset into time series csv.\n+ `Sink`: **loads** dataset to persistant storage.\n\n#### Data Ingestion\n\nIngests the data needed based on specified time window: start and end dates.\n\n##### Data Processes\n\n+ `extract`: downloads NOAA GOES-R GLM netCDF4 files from AWS s3 bucket.\n+ `transform`: converts GLM netCDF into time and geo series CSVs.\n+ `load`: loads CSVs to a local backend, persistant SQLite with Spatialite extension.\n\n## Clustering Flow\n\n\n#### Cluster Analysis\n\nPerforms grouping of the ingested data by implementing K-Means clustering algorithm.\n\n##### Data Tasks\n\n+ `preprocessor`: prepares the data for cluster model, clean and normalize the data.\n+ `kmeans_cluster`: fits the data to an implementation of k-means cluster algorithm.\n+ `silhouette_evaluator`: evaluates the choice of 'k' clusters by calculating the silhouette coefficient for each k in defined range.\n+ `elbow_evaluator`: evaluates the choice of 'k' clusters by calculating the sum of the squared distance for each k in defined range.\n\n## Dashboard Map\n\n\n\u003cp align=\"center\"\u003e\n\n|\u003ca href=\"./img/lightning-containers-dashboard.gif\" align=\"center\"\u003e\u003cimg src=\"./img/lightning-containers-dashboard.gif\" alt=\"An example dashboard of flash event data points\" width=\"600px\"/\u003e\u003c/a\u003e\n|:--:|\n|Lightning containers dashboard|\n\n\u003c/p\u003e\n\n## Testing\n\nUse the following command to run tests:\n\n`pytest`\n\n## CI/CD\n\nThis project uses GitHub Actions for CI/CD. The workflow is defined in the `.github/workflows/docker-image.yml` file. This includes:\n\n- Automated testing on pull requests\n- Data quality checks on scheduled intervals\n- Deployment of updated ml models and Spark jobs to production\n\n## Contributing\n\nPlease read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our contributing guidelines and the process for submitting pull requests.\n\n## License\n\nThis project is licensed under the Apache 2.0 License - see the [Apache 2.0 License](LICENSE) file for details. \n\n## Acknowledgements\n\nThis work would not have been possible without amazing open source software and datasets, including but not limited to:\n\n+ [GLM Dataset from NOAA NESDIS](https://www.star.nesdis.noaa.gov/goesr/documents/ATBDs/Baseline/ATBD_GOES-R_GLM_v3.0_Jul2012.pdf)\n+ [Prefect from PrefectHQ](https://docs.prefect.io/api-ref/prefect/)\n+ [Streamlit](https://docs.streamlit.io/)\n+ Built on the codebase of [Lightning Streams](https://github.com/BayoAdejare/lightning-streams).\n\nThank you to the authors of these software and datasets for making them available to the community!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbayoadejare%2Flightning-containers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbayoadejare%2Flightning-containers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbayoadejare%2Flightning-containers/lists"}