Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/derak-isaack/pharmacysales_modelling
Apache Kafka Streamlit application for real-time tracking of pharmacy sales using the start schema data model.
https://github.com/derak-isaack/pharmacysales_modelling
apache-kafka data-modeling mysql-database quix-streams starschema streamlit
Last synced: about 1 month ago
JSON representation
Apache Kafka Streamlit application for real-time tracking of pharmacy sales using the start schema data model.
- Host: GitHub
- URL: https://github.com/derak-isaack/pharmacysales_modelling
- Owner: derak-isaack
- Created: 2024-06-09T20:30:58.000Z (7 months ago)
- Default Branch: master
- Last Pushed: 2024-06-29T21:56:59.000Z (6 months ago)
- Last Synced: 2024-07-01T03:14:25.454Z (6 months ago)
- Topics: apache-kafka, data-modeling, mysql-database, quix-streams, starschema, streamlit
- Language: Python
- Homepage:
- Size: 474 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
![streamlit](https://img.shields.io/badge/Streamlit-FF4B4B?logo=streamlit&logoColor=fff&style=for-the-badge)
![SQL](https://img.shields.io/badge/SQLAlchemy-D71F00?logo=sqlalchemy&logoColor=fff&style=for-the-badge)
![SQLite](https://img.shields.io/badge/SQLite-003B57?logo=sqlite&logoColor=fff&style=for-the-badge)
![Python](https://img.shields.io/badge/Python-3776AB?logo=python&logoColor=fff&style=for-the-badge)
![Kafka](https://img.shields.io/badge/Apache%20Kafka-231F20?logo=apachekafka&logoColor=fff&style=for-the-badge)##
Pharmacy Sales Tracker`Motivation:` With the ever rising need for automation and real-time tracking across sales organizations to minimize human error and identify fraud, I sought to develop a `Streamlit` application which uses `MySQL server` database and is intergrated with `Apache Kafka` which offers `Low latency` to ensures `real-time data streaming`.
###
Project OverviewStreamlit real-time Pharmacy sales tracker that uses the `star-schema` to track sales across several pharmacy outlets for a big pharma. The application leverages on using the `star-schema` which is:
* Easier to understand and manage
* Less dependant on table joins.
* High performance.The application also uses the `MySQL server database` for data entry which has several advantages namely:
* supports transactions.
* Supports data integrity.
* Handles severall transaction requests simultaneously.
* Offers atomicity.The application also intergates `Apache Kafka` for real-time data streaming as well as transformations. Using `Kafka` offers the following benefits namely:
* `Data durability and reliability` because data is stored on disk across brokers
* `Real-time data processing`
* Flexibility in `batch and stream processing.`
* `Data auditing and compliance`: With Change Data Capture (CDC) approaches, Kafka facilitates data replication across multiple systems or databases, ensuring accurate and consistent data for auditing and compliance purposes.###
Objectives & descriptionDevelop a data model that follows the `star-schema` approach having the `dimensions` and `facts` table. The `table-models` can be found [here](pharmacy_sales_tracker.sql) which typically follows the `sql` approach.
Defining the tables in a separate file offers a more flexible approach for the application suppose further change may arise. It also provides easy debugging for the application.
`ERD-diagram` ![ERD](ERD_diagram.png)
This [python-file](helpers.py) defines a class using the traditional `python OOP` approach which offers more customization and flavour to the main `streamlit application`.It also allows form sharing from the `doctor table`, `Employee table` and `Drug items` tables which are the `dimension tables` which very vital in providing more context to the `Facts table`.
Intergrate `Apache Kafka` into the streamlit application to serve as the `Producer`. The data should be in `JSON` formart for easier ingestion into the `Kafka topics`. This is made possible by using the `serializer` which allows for transformation of data into `JSON` formart.
Read data from `Kafka topics` by a consumer to allow for `Real-time` data streaming as well as processing. The consumer can be found [here](kafka_consumer.py)
To get started with `Apache Kafka`, the `Zookeper` should be running. On windows, the command to run the `Zookeeper` is `.\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties`. The `Kafka server` should also be running and is possible by uisng the command `.\bin\windows\kafka-server-start.bat .\config\server.properties`.
N/B: Apache Kafka should be correctly configured in the environment variables to allow port communication.
###
Conclusion & Future stepsThe `Sreamlit app` is deployed locally due to the constraints of the database being available locally and `Apache Kafka` port usage. Here is a snippet of the `User Interface` for inputting sales data to provide real-time tracking.
![Dimensions-snippet](Dimensions.png)
![Facts-snippet](Facts.png)After running the `Consumer`, here is a snapshot of how the data streams in from the streamlit application
![Data-stream](data_stream.png)