Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/devonwintz/weather-stream

Real-time weather data pipeline using Flask, Kafka, Spark, Cassandra, and Grafana for generation, ingestion, processing, storage, and visualization respectively.
https://github.com/devonwintz/weather-stream

cassandra grafana kafka python spark-streaming

Last synced: 25 days ago
JSON representation

Real-time weather data pipeline using Flask, Kafka, Spark, Cassandra, and Grafana for generation, ingestion, processing, storage, and visualization respectively.

Awesome Lists containing this project

README

        

# Weather Data Pipeline

![](./assets/images/WeatherStream.png)

## Overview

This project implements a weather data pipeline that generates, ingests, processes, stores, and visualizes weather data in real-time. It uses Flask for data generation, Kafka for data ingestion, Spark for processing, Cassandra for storage, and Grafana for visualization.

## Directory Structure

![](./assets/images/Project-Directory-Structure.PNG)

## Components

### WeatherMocker
- **Purpose**: Generates random weather data simulating a real weather API.
- **Main File**: `weather-mocker/app.py`
- **Dependencies**: `weather-mocker/requirements.txt`

### WeatherData
- **Purpose**: Exposes an API to access weather data stored in Cassandra.
- **Main File**: `weather-data/app.py`
- **Dependencies**: `weather-data/requirements.txt`

### Kafka
- **Setup Script**: `kafka/kafka-setup.sh`
- **Description**: Creates a Kafka topic if it does not exist.
- **Producer**: `kafka/producer.py`

### Cassandra
- **Setup Script**: `cassandra/cassandra-init.sh`
- **Description**: Initializes Cassandra by creating a keyspace and table if they do not exist.

### Spark-Streaming
- **Setup Script**: `spark-streaming/spark-setup.sh`
- **Description**: Configures Spark for streaming and initializes necessary settings.
- **Consumer**: `spark-streaming/consumer.py`

### Grafana
- **Setup Script**: `grafana/grafana-setup.sh`
- **Description**: Installs the Cassandra plugin, adds the data source, and creates a dashboard for real-time data visualization.

## Integration Workflow

1. **Data Generation and Publishing**:
- WeatherMocker generates random weather data.
- Kafka producer publishes the data to a Kafka topic.

2. **Data Consumption and Processing**:
- Spark-Streaming consumes data from Kafka, processes it, and writes it to Cassandra.

3. **Data Visualization**:
- Grafana visualizes the real-time data stored in Cassandra.

## Deployment

The entire stack is managed using Docker Compose, defined in `docker-compose.yml`. Each service runs in its own Docker container.

## Running the Pipeline

To run the Docker pipeline, run the following two commands from the root directory:

1. Make the `run.sh` script executable:
```sh
$ chmod +x run.sh
$ ./run.sh