https://github.com/tmph2003/streaming-project-with-flink
https://github.com/tmph2003/streaming-project-with-flink
streaming-processing
Last synced: 2 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/tmph2003/streaming-project-with-flink
- Owner: tmph2003
- Created: 2024-02-10T11:57:28.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-24T19:31:43.000Z (over 2 years ago)
- Last Synced: 2024-02-24T20:33:03.401Z (over 2 years ago)
- Topics: streaming-processing
- Language: Python
- Homepage:
- Size: 30.6 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Streaming-process E-Commerce Analytics with Flink, Elasticsearch, Kibana and MySQL
This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessary infrastructure components, including Apache Flink, Elasticsearch, Kibana, and MySQL. The application processes financial transaction data from Kafka, performs aggregations, and stores the results in both MySQL and Elasticsearch for further analysis.
## Requirements
- Docker
- Docker Compose
- Python (3.9.18)
## Architecture

## Installation and Setup
1. Clone this repository.
2. Navigate to the repository directory.
3. Run `docker-compose up -d` to start the required services (Apache Flink, Elasticsearch, MySQL, Kafka).
4. Run `python src` to start project with generate data and process data.
## Usage
1. Ensure all Docker containers are up and running.
2. The Sales Transaction Generator `generate_data.py.py` helps to generate the sales Transactions into Kafka.
3. `stream_process.py` used to ELT data from kafka to destinations.
### Application Details
The application consumes financial transaction data from Kafka, performs various transformations, and stores aggregated results in both MySQL and Elasticsearch.
### Components
#### Apache Flink
- Sets up the Flink execution environment.
- Connects to Kafka as a source for financial transaction data.
- Processes, transforms, and performs aggregations on transaction data streams.
#### MySQL
- Stores transaction data and aggregated results in tables (`Transactions`, `sales_per_category`, `sales_per_day`, `sales_per_month`).
#### Elasticsearch
- Stores transaction data for further analysis.
#### Kibana
- Visualize data through dashboard.
## Code Structure
- `stream_process.py`: Contains the Flink application logic, including Kafka source setup, stream processing, transformations, and sinks for MySQL and Elasticsearch.
## Configuration
- Kafka settings (bootstrap servers, topic) are configured within the Kafka source setup.
- MySQL connection details (URL, username, password) are defined in the `jdbcUrl`, `username`, and `password` variables.
## Sink Operations
- The application includes MySQL Python API to create tables (`Transactions`, `sales_per_category`, `sales_per_day`, `sales_per_month`) and perform insert/update operations.
- Additionally, it includes an Elasticsearch Python API to index transaction data for further analysis.
---