https://github.com/tashi-2004/apache-flink-spark-data-streaming
This project showcases a real-time data streaming pipeline using Apache Flink, Apache Spark, and Grafana. It streams data, stores it in Parquet format, and performs aggregations for insights, with seamless visualization via Grafana dashboards.
https://github.com/tashi-2004/apache-flink-spark-data-streaming
apache-flink apache-spark data-aggregation data-analysis data-science data-streaming data-visualization flink flink-stream-processing flink-streaming grafana-dashboard grafana-plugin pyflink python3
Last synced: 2 months ago
JSON representation
This project showcases a real-time data streaming pipeline using Apache Flink, Apache Spark, and Grafana. It streams data, stores it in Parquet format, and performs aggregations for insights, with seamless visualization via Grafana dashboards.
- Host: GitHub
- URL: https://github.com/tashi-2004/apache-flink-spark-data-streaming
- Owner: tashi-2004
- Created: 2025-01-14T12:49:52.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-02-10T09:54:49.000Z (8 months ago)
- Last Synced: 2025-04-05T20:11:27.137Z (6 months ago)
- Topics: apache-flink, apache-spark, data-aggregation, data-analysis, data-science, data-streaming, data-visualization, flink, flink-stream-processing, flink-streaming, grafana-dashboard, grafana-plugin, pyflink, python3
- Language: Python
- Homepage:
- Size: 62.6 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data-Streaming-Flink-Spark
This repository demonstrates a real-time data streaming and persistence workflow using **Apache Flink**, **Apache Spark**, and **Grafana** for monitoring. The pipeline streams data, persists it in Parquet format, and performs aggregations for analytical insights.---
## Prerequisites
Ensure the following software is installed on your system:
- **Apache Flink**
- **Apache Spark**
- **Python 3.x**
- **Grafana**For installation guidance, refer to tutorials available on YouTube or other online resources.
---
## Setup and Execution
Follow these steps to set up and execute the data streaming pipeline:
### 1. Start the Flink Cluster
1. Navigate to the `bin` folder in Flink's directory.
2. Run the command:
```bash
./start-cluster.sh
```
### 2. Stream Data with Flink
1. Execute the following command to start streaming:
```bash
python3 flink_streaming.py
```2. The streaming process will begin, and you can monitor the progress in the terminal.
### 3. Persist Data with Spark
1. Run the following command to process and persist data:
```bash
python3 spark_persist.py
```

2. This step will create **seven Parquet files** in a folder named `spark_persisted_output` located in your home directory.

### 4. Perform Aggregations1. Execute the following command to perform data aggregation:
```bash
python3 streaming_aggregates.py
```

2. The output will be stored in a folder named `aggregated_output` in your home directory.
### 5. Set Up Grafana Dashboard
1. Download and install **Grafana**.
2. Visit Grafana at `http://localhost:3000` (default port).
3. Create a dashboard and import the provided JSON file to visualize the streaming and aggregated data.
![]()
![]()
---## Note
- Update the paths for each Python file (`flink_streaming.py`, `spark_persist.py`, `streaming_aggregates.py`) according to your system setup.
- Ensure all required dependencies are installed and configured correctly.
- You can download the original dataset from: [Download](https://mega.nz/file/OJUxVKCB#vWVfFYmnAzAM0PTMBZRSmmrWePcmoN1qIpM0kd4zFRw)---
## Repository Structure
```
Data-Streaming-Flink-Spark/
├── flink_streaming.py # Flink streaming script
├── spark_persist.py # Spark persistence script
├── streaming_aggregates.py # Data aggregation script
├── dashboard.json # Grafana dashboard configuration file
├── spark_persisted_output/ # Output folder for Spark persisted data
├── aggregated_output/ # Output folder for aggregated data
```---
## Contact
For queries or contributions, please contact:
**Tashfeen Abbasi**
Email: abbasitashfeen7@gmail.com