https://github.com/jina-ai/executor-kakfa2kafka-processor
Enrich documents from Kafka using Docarray and Executor and publish the enrichment document back to Kafka.
https://github.com/jina-ai/executor-kakfa2kafka-processor
Last synced: 3 months ago
JSON representation
Enrich documents from Kafka using Docarray and Executor and publish the enrichment document back to Kafka.
- Host: GitHub
- URL: https://github.com/jina-ai/executor-kakfa2kafka-processor
- Owner: jina-ai
- Created: 2022-09-14T11:36:20.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2022-09-15T09:51:46.000Z (about 3 years ago)
- Last Synced: 2025-06-22T05:03:51.829Z (4 months ago)
- Language: Python
- Size: 21.5 KB
- Stars: 1
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Kafka Executor
[Kafka](https://kafka.apache.org) is very powerful message bus platform used for various applications. This demo repository has simple executors integrating with
Kafka Consumer and Producer for processing generic json messages into DocArray and from json to Document json messages.# Install and initialize Kafka locally
Install requirements using:
```shell
pip install -r requirements.txt
```A sample Kafka app is provided in the [tests/docker-compose.yml](tests/docker-compose.yml). Local Kafka application can be started by running `docker-compose -f tests/docker-compose up -d`.
For creating the topics and sample data for the demo, run:
```shell
python init_kafka_topics.py
```The script creates one topic `input_raw_docs` and adds a simple json message with a key and text field containing random characters. The second topic `enriched_docs` is created and the data will be populated by the producer executor.
To remove all data in Kafka, run:
```shell
python delete_topics.py
```## KafkaToKafka Executor
Simpler consumer and producer executor that consumes raw messages/documents from the CONSUMER_TOPIC and write the processed Docarray to the PRODUCER_TOPIC. A BATCH_SIZE environment is also provided to configure the publish batch size. On a `/` post request, the batch raw documents are converted to a Document and a random embedding is added to the document. A Docarray is created from the batch which is then published as json to the PRODUCER_TOPIC.
# Flow
A sample flow can be triggered by running
```shell
python run_kafka2kafka_flow.py
```