https://github.com/derak-isaack/pharmacysales_modelling
Apache Kafka Streamlit application for real-time tracking of pharmacy sales using the start schema data model.
https://github.com/derak-isaack/pharmacysales_modelling
apache-kafka data-modeling mysql-database quix-streams starschema streamlit
Last synced: 3 months ago
JSON representation
Apache Kafka Streamlit application for real-time tracking of pharmacy sales using the start schema data model.
- Host: GitHub
- URL: https://github.com/derak-isaack/pharmacysales_modelling
- Owner: derak-isaack
- Created: 2024-06-09T20:30:58.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2024-06-29T21:56:59.000Z (12 months ago)
- Last Synced: 2025-01-21T06:11:55.801Z (5 months ago)
- Topics: apache-kafka, data-modeling, mysql-database, quix-streams, starschema, streamlit
- Language: Python
- Homepage:
- Size: 474 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README




##
Pharmacy Sales Tracker`Motivation:` With the ever rising need for automation and real-time tracking across sales organizations to minimize human error and identify fraud, I sought to develop a `Streamlit` application which uses `MySQL server` database and is intergrated with `Apache Kafka` which offers `Low latency` to ensures `real-time data streaming`.
###
Project OverviewStreamlit real-time Pharmacy sales tracker that uses the `star-schema` to track sales across several pharmacy outlets for a big pharma. The application leverages on using the `star-schema` which is:
* Easier to understand and manage
* Less dependant on table joins.
* High performance.The application also uses the `MySQL server database` for data entry which has several advantages namely:
* supports transactions.
* Supports data integrity.
* Handles severall transaction requests simultaneously.
* Offers atomicity.The application also intergates `Apache Kafka` for real-time data streaming as well as transformations. Using `Kafka` offers the following benefits namely:
* `Data durability and reliability` because data is stored on disk across brokers
* `Real-time data processing`
* Flexibility in `batch and stream processing.`
* `Data auditing and compliance`: With Change Data Capture (CDC) approaches, Kafka facilitates data replication across multiple systems or databases, ensuring accurate and consistent data for auditing and compliance purposes.###
Objectives & descriptionDevelop a data model that follows the `star-schema` approach having the `dimensions` and `facts` table. The `table-models` can be found [here](pharmacy_sales_tracker.sql) which typically follows the `sql` approach.
Defining the tables in a separate file offers a more flexible approach for the application suppose further change may arise. It also provides easy debugging for the application.
`ERD-diagram` 
This [python-file](helpers.py) defines a class using the traditional `python OOP` approach which offers more customization and flavour to the main `streamlit application`.It also allows form sharing from the `doctor table`, `Employee table` and `Drug items` tables which are the `dimension tables` which very vital in providing more context to the `Facts table`.
Intergrate `Apache Kafka` into the streamlit application to serve as the `Producer`. The data should be in `JSON` formart for easier ingestion into the `Kafka topics`. This is made possible by using the `serializer` which allows for transformation of data into `JSON` formart.
Read data from `Kafka topics` by a consumer to allow for `Real-time` data streaming as well as processing. The consumer can be found [here](kafka_consumer.py)
To get started with `Apache Kafka`, the `Zookeper` should be running. On windows, the command to run the `Zookeeper` is `.\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties`. The `Kafka server` should also be running and is possible by uisng the command `.\bin\windows\kafka-server-start.bat .\config\server.properties`.
N/B: Apache Kafka should be correctly configured in the environment variables to allow port communication.
###
Conclusion & Future stepsThe `Sreamlit app` is deployed locally due to the constraints of the database being available locally and `Apache Kafka` port usage. Here is a snippet of the `User Interface` for inputting sales data to provide real-time tracking.

After running the `Consumer`, here is a snapshot of how the data streams in from the streamlit application
