Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/airscholar/apacheflink-salesanalytics
This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project demonstrates how to ingest, process, and analyze sales data, showcasing the capabilities of Apache Flink for big data processing.
https://github.com/airscholar/apacheflink-salesanalytics
apache-flink data-engineering end-to-end-data-engineering sales-analytics
Last synced: about 2 months ago
JSON representation
This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project demonstrates how to ingest, process, and analyze sales data, showcasing the capabilities of Apache Flink for big data processing.
- Host: GitHub
- URL: https://github.com/airscholar/apacheflink-salesanalytics
- Owner: airscholar
- Created: 2023-11-18T19:43:10.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-11-18T19:50:47.000Z (about 1 year ago)
- Last Synced: 2024-04-18T02:57:10.698Z (9 months ago)
- Topics: apache-flink, data-engineering, end-to-end-data-engineering, sales-analytics
- Language: Java
- Homepage: https://youtu.be/jhJQp46QB_c
- Size: 6.84 KB
- Stars: 6
- Watchers: 2
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Apache Flink for Sales Analytics
This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project demonstrates how to ingest, process, and analyze sales data, showcasing the capabilities of Apache Flink for big data processing.
## Project Overview
The project reads sales and product data from CSV files, performs a join operation to aggregate data, and computes total sales per category. It then sorts the results and writes them back to a CSV file. This example serves as a practical demonstration of using Apache Flink for complex data transformations and analytics.
### Features
- Data ingestion from CSV files
- Use of POJOs for data representation
- Dataset joins and aggregations
- Custom output formats for writing data## Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
### Prerequisites
- Apache Flink
- Java Development Kit (JDK)
- Maven or SBT (for building the project)### Installation
1. **Clone the repository:**
```bash
https://github.com/airscholar/ApacheFlink-SalesAnalytics.git
```2. **Navigate to the project directory:**
```bash
cd ApacheFlink-SalesAnalytics
```3. **Build the project:**
```bash
mvn clean install
```### Running the Application
1. **Start your Apache Flink cluster.**
2. **Submit the Flink job:**
```bash
flink run -c salesAnalysis.DataBatchJob target/SalesAnalysis-1.0-SNAPSHOT.jar
```3. **Check the output.**
The processed data will be written to the specified output file.
### Video
[![Sales Analytics with Apache Flink](https://img.youtube.com/vi/jhJQp46QB_c/0.jpg)](https://youtu.be/jhJQp46QB_c)