Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ndiplacide7/air_quality_monitor
Real-Time Air Quality Monitoring System using Django, Apache Hadoop, Apache Kafka, and AWS services.
https://github.com/ndiplacide7/air_quality_monitor
apache-kafka aws css django hadoop html mysql python3
Last synced: about 2 months ago
JSON representation
Real-Time Air Quality Monitoring System using Django, Apache Hadoop, Apache Kafka, and AWS services.
- Host: GitHub
- URL: https://github.com/ndiplacide7/air_quality_monitor
- Owner: ndiplacide7
- License: mit
- Created: 2024-12-04T10:00:58.000Z (about 2 months ago)
- Default Branch: master
- Last Pushed: 2024-12-04T10:27:10.000Z (about 2 months ago)
- Last Synced: 2024-12-04T11:25:22.651Z (about 2 months ago)
- Topics: apache-kafka, aws, css, django, hadoop, html, mysql, python3
- Language: Python
- Homepage:
- Size: 10.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Air Quality Monitor: Comprehensive Environmental Data Solution
### Team Contributions
```markdown
## GROUP NO : 7
------------------------------------------------
| No. | Name | Registration Number |
|-----|-------------------|---------------------|
| 1 | NDUWAYEZU Placide | 223027936 |
| 2 | UWASE Aline | 218009283 |
| 3 | MUREMYI Samuel | 223026694 |
-------------------------------------------------```
## Project Case Study
### Background and Motivation
In an era of increasing environmental concerns, real-time air quality monitoring has become crucial for public health and environmental policy. The Air Quality Monitor project aims to create a robust, scalable solution for tracking and analyzing air quality data.
### Project Objectives1. **Real-time Data Collection**: Develop a system to continuously fetch air quality data from official sources.
2. **Data Processing Pipeline**: Create an efficient mechanism to transform raw API data into meaningful insights.
3. **Distributed Storage**: Implement a scalable storage solution using HDFS and cloud databases.
4. **Data Visualization**: Build an interactive dashboard for accessible environmental insights.## Project Architecture
### Technical Architecture- **Data Ingestion**: Retrieve data from ACT Government Air Quality API
- **Stream Processing**: Apache Kafka for real-time data streaming
- **Data Storage**:
- Distributed Storage: Apache Hadoop HDFS
- Persistent Storage: AWS MySQL RDS
- **Web Framework**: Django
- **Data Processing**: Pandas, PyArrow## System Design Diagram
```bash
[API Source] → [Kafka Stream] → [Data Processing] → [HDFS Storage] → [MySQL RDS] → [Django Dashboard]
```
## Challenges and Solutions
1. **Dataset Availability**
- **Challenge**: Lack of Accessible Air Quality Datasets for Rwanda
- Initial project goal was to develop an air quality monitoring system for Rwanda
- Significant obstacles encountered in obtaining comprehensive, reliable air quality data
- Limited public APIs and open data sources for environmental monitoring in Rwanda
- **Solution**:
- Utilized ACT's robust air quality monitoring system as a proof-of-concept model2. **Local Infrastructure set-up Complexity**: Kafka and Hadoop Setup
- **Challenge**: Overcoming Windows-Specific Installation Barriers
- Significant complexity in natively installing Kafka and Hadoop on Windows
- Multiple compatibility and configuration issues with distributed systems
- **Solution**:
- Comprehensive Docker-Based Solution: Leveraged Docker containers to create a consistent, reproducible development environment
- ![img.png](img.png)### Team Contributions
```markdown
------------------------------------------------
| No. | Name | Registration Number |
|-----|-------------------|---------------------|
| 1 | NDUWAYEZU Placide | 223027936 |
| 2 | UWASE Aline | 218009283 |
| 3 | MUREMYI Samuel | 223026694 |
-------------------------------------------------```
### Data Source
The project retrieves real-time air quality data from the ACT (Australian Capital Territory) Ambient Air Quality Monitoring API: https://www.data.act.gov.au/resource/94a5-zqnn.json
### Key Features
- Real-time data retrieval from official air quality API
- Data processing pipeline
- HDFS storage using Docker
- AWS MySQL RDS data persistence
- Interactive dashboard for air quality visualization### Technology Stack
- Backend: Django
- Data Processing:
- Pandas
- PyArrow
- Data Storage:
- Apache Hadoop (HDFS)
- AWS MySQL RDS- Message Streaming: Apache Kafka
- Additional Libraries:
- Requests
- Confluent Kafka
- Pytz### Prerequisites
- Python 3.8+
- Docker (optional, for HDFS)
- AWS RDS MySQL instance
- Apache Kafka
- Apache Hadoop### Installation
1. Clone the repository:
```bash
git clone https://github.com/your-username/air_quality_monitor.git
cd air_quality_monitor
```
2. Create a virtual environment
```bash
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
4. Configure Database:
- Set up your AWS MySQL RDS credentials in settings.py
- Configure Kafka and Hadoop connection details5. Run Database Migrations:
```bash
python manage.py makemigrations
python manage.py migrate
```
6. Create Superuser (Optional):
```bash
python manage.py createsuperuser
```
### Running the Application#### Start Data Pipeline (Manual Trigger)
```bash
python manage.py run_air_quality_pipeline
```
#### Run Django Development Server (On preferred port)
```bash
python manage.py runserver 9006
```
#### Docker HDFS Integration
To copy processed data to HDFS:
```bash
docker cp :/path/in/hdfs
```### Configuration
Ensure the following configurations are set:
- API Endpoint
- Kafka Bootstrap Servers
- HDFS Connection
- AWS RDS Credentials### Data Flow
1. Fetch data from ACT Air Quality API
2. Process raw data
3. Generate unique Parquet filename
4. Convert to PyArrow table
5. Save to HDFS
6. Persist in AWS MySQL RDS
7. Visualize in Dashboard#### Environment Variables
Create a .env file for sensitive data (like Credential):
```bash
API_ENDPOINT=https://www.data.act.gov.au/resource/94a5-zqnn.json
KAFKA_BOOTSTRAP_SERVERS=your-kafka-servers
HDFS_HOST=your-hdfs-host
AWS_RDS_HOST=your-rds-endpoint
AWS_RDS_USER=your-username
AWS_RDS_PASSWORD=your-password
```
### Troubleshooting- Ensure all services (Kafka, Hadoop, RDS) are running
- Check network connectivity
- Verify API access
- Review logs for detailed error information### Contributing
- Fork the repository
- Create your feature branch (git checkout -b feature/AmazingFeature)
- Commit your changes (git commit -m 'Add some AmazingFeature')
- Push to the branch (git push origin feature/AmazingFeature)
- Open a Pull Request### License
- MIT### Acknowledgements
- ACT Government for Open Data
- Open Source Community### Contact
- Placide - ndiplacide7@gailcom
- Project Link: https://github.com/ndiplacide7/air_quality_monitor