Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/northbrains/dashboard-1

An interactive dashboard simulating live sales and warehouse data for the company.
https://github.com/northbrains/dashboard-1

cassandra containers dashboard data-engineering data-science docker docker-compose kafka kafka-streaming plotly plotly-dash spark spark-streaming streaming

Last synced: 5 days ago
JSON representation

An interactive dashboard simulating live sales and warehouse data for the company.

Awesome Lists containing this project

README

        

Sales and Warehouse Live Data Dashboard 📊

Dashboard Diagram


This project is designed to create a live data dashboard for Sales and Warehouse using a modern data streaming and processing architecture. The architecture leverages Apache Kafka for data streaming, Apache Spark for processing, Cassandra for data storage, and Plotly Dash for creating an analytical dashboard. All components are containerized using Docker to ensure easy deployment and scalability.

Architecture Overview 🔧



  • 🐳 Docker: All components are containerized to ensure consistent environments across different platforms and ease of deployment.


  • 🐍 Python Script (Producer): Acts as the data source, sending data streams to specific Kafka topics ("Sales" and "Warehouse").


  • 📦 Kafka Cluster: Comprises three controllers and three brokers to manage and distribute the data streams. This setup uses Kraft (Kafka Raft) mode instead of the traditional Zookeeper-based architecture. Kraft eliminates the need for Zookeeper by integrating consensus and metadata management directly into Kafka. This results in a simpler architecture with reduced operational complexity, improved scalability, and faster recovery from failures. Kafka automatically creates the required topics (init topics container) and ensures data flows correctly through the system.


  • ⚡ Apache Spark (Consumer): Receives and processes the streaming data from Kafka. The Spark cluster consists of two workers and one master node, where Spark submits jobs (on the workers) one after the other, processes the data based on the Kafka topics ("Sales" and "Warehouse"), and inserts the results into Cassandra, acting as a consumer.


  • 📊 Cassandra: Stores the processed data from Spark. It offers high availability and scalability, making it ideal for real-time data storage.


  • 📈 Plotly Dash: Provides an analytical dashboard for visualizing the data stored in Cassandra, allowing users to interact with and analyze the live data streams. It allows us to switch between Sales and Warehouse live data.


  • 🚀 init-cassandra Container: This additional container automatically creates the keyspace and the necessary tables in Cassandra when the environment is started, ensuring full automation of the setup process.

Getting Started

Prerequisites


🐳 Docker and Docker Compose installed on your machine

Setting Up the Environment


  • Clone the repository:
    git clone [email protected]:NorthBrains/dashboard-1.git


  • Start the Docker containers:
    docker-compose up -d

    Once Docker Compose is up, all services including the streaming, processing, and dashboard will automatically start without requiring additional configuration.



  • Set Up Cassandra Keyspace and Tables:

    After the Cassandra container is up and running, the init-cassandra container will automatically create the necessary keyspace and tables. This container ensures that the database schema is initialized properly without manual intervention.


  • Checking if the Streaming is Working


    To verify if the streaming is working correctly, you can execute the following commands to query the data in Cassandra:


    docker exec -it cassandra_one cqlsh -u cassandra -p cassandra

    SELECT * FROM company_one.sales_data;

    SELECT * FROM company_one.warehouse_data;

    Accessing the Spark Master GUI


    You can access the Spark Master GUI by navigating to http://localhost:8190 in your web browser.


    This interface allows you to monitor the status of running Spark workers and applications. You can check the health and performance of the Spark cluster, including the details of each worker node, active jobs, stages, and tasks.

    Accessing the Dashboard


    Once all services are up and running, you can access the Plotly Dash dashboard by navigating to http://localhost:8900 in your web browser.

    Stopping the Environment


    To stop and remove all running containers, execute:


    docker-compose down

    Additional Information



    • 📦 Kafka: Ensure that the topics are correctly initialized and the data is being streamed to the appropriate topics (Sales and Warehouse).


    • ⚡ Spark: The Spark jobs should be configured to read from Kafka, process the data, and write the results to Cassandra.


    • 📊 Cassandra: Regularly monitor the storage and performance to ensure that it scales according to the incoming data volume.


    • 📈 Dash: Customize the dashboards as needed to include more visualizations or interactive elements that suit your data analysis needs.

    Contributions


    Feel free to contribute to this project by submitting issues or pull requests. All contributions are welcome and appreciated!