https://github.com/zoltan-nz/kafka-spark-project

Distributed System in Docker with Apache Kafka and Spark for big data streaming and visualisation (NodeJS, TypeScript, React, NestJS, Java)
https://github.com/zoltan-nz/kafka-spark-project

java javascript kafka nodejs spark typescript

Last synced: 3 months ago
JSON representation

Distributed System in Docker with Apache Kafka and Spark for big data streaming and visualisation (NodeJS, TypeScript, React, NestJS, Java)

Host: GitHub
URL: https://github.com/zoltan-nz/kafka-spark-project
Owner: zoltan-nz
Created: 2018-03-22T20:49:50.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2019-04-28T11:24:47.000Z (over 6 years ago)
Last Synced: 2025-04-10T10:14:08.336Z (6 months ago)
Topics: java, javascript, kafka, nodejs, spark, typescript
Language: TypeScript
Homepage:
Size: 5.08 MB
Stars: 21
Watchers: 0
Forks: 15
Open Issues: 5
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Apache Kafka and Apache Spark - A Distributed Streaming Project

For detailed project report, please check out [The Project Report](docs/final-report.md). ;)

## Running the project

You can try out this project running all components in a docker composed cluster, each component will run in a separated container and they will be connected with the default network inside the compose cluster.

Prerequisites:

- Docker

- Port 80 should be available or change the port mapping of the `frontend` app in `docker-compose.yml`

Run the project:

- Clone this repository on your computer

- Fire up the docker compose cluster:

```

$ docker-compose up

```

- Open the frontend application in your browser: `$ open http://localhost:80`

Shutting down docker-compose (use an other terminal window for running this command):

```

$ docker-compose down

```

Notes:

- Building the maven based projects at the first time takes a while, please be patient.

- Kafka generated files are mapped to `./kafka/volumes` folder. If the streaming does not start when the project is launched at first time, shut down the docker-compose cluster and start it up again. All mapped volumes and folders should be available at the second launch and Kafka can start properly.

## Run the project in developer mode

You can run all components locally. In this way you can easily debug and add new features.

**Prerequisites:**

- Locally installed Node.js ([How to Install Node.js](http://yoember.com/nodejs/the-best-way-to-install-node-js/))

- Locally installed Java 8 and Maven 3.5

- Locally installed Kafka with Zookeeper ([Kafka Quickstart](https://kafka.apache.org/quickstart))

- Locally installed Spark ([Install Spark](http://spark.apache.org/downloads.html))

The `setup` npm script will install individual packages and prepare the project for you. The `start:dev` will run all component's development script concurrently in the same terminal.

```

$ npm run setup

$ npm run start:dev

```

## Documents

- [Final Report](docs/final-report.md)

- [Original project proposal](docs/proposal.md)

- [Notes about datasources](docs/finding-datastream-notes.md)

## Component's README files

- [Backend](backend/README.md)

- [API framework, Nest.js original README](backend/FRAMEWORK_README.md)

- [Frontend](frontend/README.md)

- [Frontend framework, React.js original README](frontend/FRAMEWORK_README.md)

- [Kafka](kafka/README.md)

- [SparkStreamer](SparkStreamer/README.md)

## Useful links

- [Awesome Streaming](https://github.com/manuzhang/awesome-streaming)

- [Exactly-once Support in Apache Kafka](https://medium.com/@jaykreps/exactly-once-support-in-apache-kafka-55e1fdd0a35f)

- [Understanding When to use RabbitMQ or Apache Kafka](https://content.pivotal.io/blog/understanding-when-to-use-rabbitmq-or-apache-kafka)

- [Why need Zookeeper for Kafka?](https://www.quora.com/What-is-the-actual-role-of-Zookeeper-in-Kafka-What-benefits-will-I-miss-out-on-if-I-don%E2%80%99t-use-Zookeeper-and-Kafka-together)

## Reading

- [7 essential technologies for a modern data architecture](https://www.infoworld.com/article/3257105/big-data/7-essential-technologies-for-a-modern-data-architecture.html)

- [5 Layered architecture](https://www.researchgate.net/publication/259172538_5-Layered_Architecture_of_Cloud_Database_Management_System)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zoltan-nz/kafka-spark-project

Awesome Lists containing this project

README