https://github.com/zoltan-nz/kafka-spark-project
Distributed System in Docker with Apache Kafka and Spark for big data streaming and visualisation (NodeJS, TypeScript, React, NestJS, Java)
https://github.com/zoltan-nz/kafka-spark-project
java javascript kafka nodejs spark typescript
Last synced: 3 months ago
JSON representation
Distributed System in Docker with Apache Kafka and Spark for big data streaming and visualisation (NodeJS, TypeScript, React, NestJS, Java)
- Host: GitHub
- URL: https://github.com/zoltan-nz/kafka-spark-project
- Owner: zoltan-nz
- Created: 2018-03-22T20:49:50.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-04-28T11:24:47.000Z (over 6 years ago)
- Last Synced: 2025-04-10T10:14:08.336Z (6 months ago)
- Topics: java, javascript, kafka, nodejs, spark, typescript
- Language: TypeScript
- Homepage:
- Size: 5.08 MB
- Stars: 21
- Watchers: 0
- Forks: 15
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Apache Kafka and Apache Spark - A Distributed Streaming Project
For detailed project report, please check out [The Project Report](docs/final-report.md). ;)
## Running the project
You can try out this project running all components in a docker composed cluster, each component will run in a separated container and they will be connected with the default network inside the compose cluster.
Prerequisites:
- Docker
- Port 80 should be available or change the port mapping of the `frontend` app in `docker-compose.yml`Run the project:
- Clone this repository on your computer
- Fire up the docker compose cluster:```
$ docker-compose up
```- Open the frontend application in your browser: `$ open http://localhost:80`
Shutting down docker-compose (use an other terminal window for running this command):
```
$ docker-compose down
```Notes:
- Building the maven based projects at the first time takes a while, please be patient.
- Kafka generated files are mapped to `./kafka/volumes` folder. If the streaming does not start when the project is launched at first time, shut down the docker-compose cluster and start it up again. All mapped volumes and folders should be available at the second launch and Kafka can start properly.## Run the project in developer mode
You can run all components locally. In this way you can easily debug and add new features.
**Prerequisites:**
- Locally installed Node.js ([How to Install Node.js](http://yoember.com/nodejs/the-best-way-to-install-node-js/))
- Locally installed Java 8 and Maven 3.5
- Locally installed Kafka with Zookeeper ([Kafka Quickstart](https://kafka.apache.org/quickstart))
- Locally installed Spark ([Install Spark](http://spark.apache.org/downloads.html))The `setup` npm script will install individual packages and prepare the project for you. The `start:dev` will run all component's development script concurrently in the same terminal.
```
$ npm run setup
$ npm run start:dev
```## Documents
- [Final Report](docs/final-report.md)
- [Original project proposal](docs/proposal.md)
- [Notes about datasources](docs/finding-datastream-notes.md)## Component's README files
- [Backend](backend/README.md)
- [API framework, Nest.js original README](backend/FRAMEWORK_README.md)
- [Frontend](frontend/README.md)
- [Frontend framework, React.js original README](frontend/FRAMEWORK_README.md)
- [Kafka](kafka/README.md)
- [SparkStreamer](SparkStreamer/README.md)## Useful links
- [Awesome Streaming](https://github.com/manuzhang/awesome-streaming)
- [Exactly-once Support in Apache Kafka](https://medium.com/@jaykreps/exactly-once-support-in-apache-kafka-55e1fdd0a35f)
- [Understanding When to use RabbitMQ or Apache Kafka](https://content.pivotal.io/blog/understanding-when-to-use-rabbitmq-or-apache-kafka)
- [Why need Zookeeper for Kafka?](https://www.quora.com/What-is-the-actual-role-of-Zookeeper-in-Kafka-What-benefits-will-I-miss-out-on-if-I-don%E2%80%99t-use-Zookeeper-and-Kafka-together)## Reading
- [7 essential technologies for a modern data architecture](https://www.infoworld.com/article/3257105/big-data/7-essential-technologies-for-a-modern-data-architecture.html)
- [5 Layered architecture](https://www.researchgate.net/publication/259172538_5-Layered_Architecture_of_Cloud_Database_Management_System)