Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rodrigo-arenas/kafkaml-anomaly-detection
Project for real-time anomaly detection using Kafka and python
https://github.com/rodrigo-arenas/kafkaml-anomaly-detection
anomaly-detection apache-karaf confluent-kafka kafka machine-learning python real-time-analytics real-time-processing scikit-learn scikit-learning sklearn stream-processing
Last synced: 4 months ago
JSON representation
Project for real-time anomaly detection using Kafka and python
- Host: GitHub
- URL: https://github.com/rodrigo-arenas/kafkaml-anomaly-detection
- Owner: rodrigo-arenas
- License: mit
- Created: 2021-06-16T19:26:05.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-12-04T18:32:54.000Z (about 2 years ago)
- Last Synced: 2024-10-03T12:29:47.153Z (5 months ago)
- Topics: anomaly-detection, apache-karaf, confluent-kafka, kafka, machine-learning, python, real-time-analytics, real-time-processing, scikit-learn, scikit-learning, sklearn, stream-processing
- Language: Python
- Homepage:
- Size: 9.69 MB
- Stars: 55
- Watchers: 4
- Forks: 19
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# kafkaml-anomaly-detection
Project for real-time anomaly detection using kafka and pythonIt's assumed that zookeeper and kafka are running in the localhost, it follows this process:
- Train an unsupervised machine learning model for anomalies detection
- Save the model to be used in real-time predictions
- Generate fake streaming data and send it to a kafka topic
- Read the topic data with several subscribers to be analyzed by the model
- Predict if the data is an anomaly, if so, send the data to another kafka topic
- Subscribe a slack bot to the last topic to send a message in slack channel if
an anomaly arrivesThis could be illustrated as:
data:image/s3,"s3://crabby-images/cb196/cb1967ec9dbf6fe1fc85bad25a271f1ba2ed138c" alt="Diagram"
Article explaining how to run this project: [medium](https://towardsdatascience.com/real-time-anomaly-detection-with-apache-kafka-and-python-3a40281c01c9)
# Demo
Generate fake transactions into a kafka topic:
data:image/s3,"s3://crabby-images/2cdd1/2cdd1fe860e55ae22e038882b6255800030c76f0" alt="Transactions"Predict and send anomalies to another kafka topic
data:image/s3,"s3://crabby-images/0e054/0e0544a8d3e7345afcf07d053b493ac9310c60bd" alt="Anomalies"Producer and anomaly detection running at the same time
data:image/s3,"s3://crabby-images/4bba9/4bba98fc212ba9cdcca44f11ac04e6476f5c5389" alt="Concurrent"
Send notifications to Slack
data:image/s3,"s3://crabby-images/d2433/d2433c969911605f5907db65f984e38a6e2d91d6" alt="Slack"# Usage:
* First train the anomaly detection model, run the file:
```bash
model/train.py
```* Create the required topics
```bash
kafka-topics.sh --zookeeper localhost:2181 --topic transactions --create --partitions 3 --replication-factor 1
kafka-topics.sh --zookeeper localhost:2181 --topic anomalies --create --partitions 3 --replication-factor 1
```* Check the topics are created
```bash
kafka-topics.sh --zookeeper localhost:2181 --list
```* Check file **settings.py** and edit the variables if needed
* Start the producer, run the file
```bash
streaming/producer.py
```* Start the anomalies detector, run the file
```bash
streaming/anomalies_detector.py
```* Start sending alerts to Slack, make sure to register the env variable SLACK_API_TOKEN,
then run```bash
streaming/bot_alerts.py
```