{"id":25428104,"url":"https://github.com/anqorithm/realtime-stockstream","last_synced_at":"2025-10-31T17:30:32.540Z","repository":{"id":208057552,"uuid":"720726727","full_name":"anqorithm/RealTime-StockStream","owner":"anqorithm","description":"RealTime StockStream is a streamlined, simulation system for processing live stock market data. It uses Apache Kafka for data input, Apache Spark for data handling, and Apache Cassandra for data storage, making it a powerful yet easy-to-use tool for financial data analysis","archived":false,"fork":false,"pushed_at":"2024-01-26T13:10:02.000Z","size":5619,"stargazers_count":21,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-15T18:53:43.022Z","etag":null,"topics":["apache-spark","apache-sparksql","asynchronous","bigdata","cassendra","data-stream-processing","databases","docker","docker-compose","kafka","python","realtime","spark","spark-master","spark-streaming","stock-market","stocks","zookeeper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/anqorithm.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-11-19T12:15:00.000Z","updated_at":"2025-02-15T11:37:29.000Z","dependencies_parsed_at":"2023-12-03T12:23:36.584Z","dependency_job_id":"6203b186-4697-42bf-931a-99d75116e7f6","html_url":"https://github.com/anqorithm/RealTime-StockStream","commit_stats":null,"previous_names":["qahta0/realtime-stockstream","anqorithm/realtime-stockstream"],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anqorithm%2FRealTime-StockStream","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anqorithm%2FRealTime-StockStream/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anqorithm%2FRealTime-StockStream/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anqorithm%2FRealTime-StockStream/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/anqorithm","download_url":"https://codeload.github.com/anqorithm/RealTime-StockStream/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239221266,"owners_count":19602378,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-spark","apache-sparksql","asynchronous","bigdata","cassendra","data-stream-processing","databases","docker","docker-compose","kafka","python","realtime","spark","spark-master","spark-streaming","stock-market","stocks","zookeeper"],"created_at":"2025-02-17T01:38:01.336Z","updated_at":"2025-10-31T17:30:32.502Z","avatar_url":"https://github.com/anqorithm.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RealTime StockStream\n\nRealTime StockStream is a streamlined system for processing live stock market data. It uses Apache Kafka for data input, Apache Spark for data handling, and Apache Cassandra for data storage, making it a powerful yet easy-to-use tool for financial data analysis 💹🕊️\n\n\n![real-time-stock-stream](./assets/background.jpg)\n\n## Getting Started\n\nThis guide will walk you through setting up and running the RealTime StockStream on your local machine for development and testing.\n\n### Prerequisites\n\nEnsure you have the following software installed:\n- Docker\n- Python (version 3.11 or higher)\n\n\n### Todo Features\n\n1. **Live Market Data Integration** ⌛\n2. **Advanced Analytics Features** ⌛\n3. **Interactive Data Visualization** ⌛\n4. **Improved Scalability** ⌛\n5. **User Customization Options** ⌛\n6. **Stronger Security** ⌛\n\n\n### Used Techs\n\n![used techs](./assets/usedTechs.jpg)\n\n- Appache Kafka\n- Appache Cassandra\n- Appache ZooKeeper\n- Appache Spark\n- Python\n\n\n### Installation\n\nFollow these steps to set up your development environment:\n\n#### Setting Up Kafka\n\n1. **Create a Kafka Topic**:\n   ```bash\n   kafka-topics.sh --create --topic stocks --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1\n   ```\n\n\n## Suppored Data Opreations\n\n1. **Grouping Aggregation:** Summarize data by groups.\n2. **Pivot Aggregation:** Reshape data, converting rows to columns.\n3. **Rollups and Cubes:** Perform hierarchical and combinational aggregations.\n4. **Ranking Functions:** Assign ranks within data partitions.\n5. **Analytic Functions:** Compute aggregates while maintaining row-level details.\n\n\n## Database Schema\n\n![stockdata-schema](./assets/stockdata-schema.png)\n\n#### Configuring Cassandra\n\n1. **Create a Keyspace and Table**:\n   Execute the following CQL commands to set up your Cassandra database:\n   ```sql\n   CREATE KEYSPACE stockdata WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 1};\n\n   CREATE TABLE stockdata.stocks (\n       stock text,\n       trade_id uuid,\n       price decimal,\n       quantity int,\n       trade_type text,\n       trade_date date,\n       trade_time time,\n       PRIMARY KEY (stock, trade_id)\n   );\n   ```\n\n## System Architecture\n\n![system-architecture](./assets/systemArchitecture.svg)\n\n\n#### Docker Compose\n\n1. **Launch Services**:\n   Use Docker Compose to start Kafka, Zookeeper, Cassandra, and Spark services:\n   ```yaml\n    version: '3.9'\n\n    name: \"realtime-stock-market\"\n\n    services:\n    zookeeper:\n        image: bitnami/zookeeper:latest\n        ports:\n        - \"2181:2181\"\n        environment:\n        - ALLOW_ANONYMOUS_LOGIN=yes\n        networks:\n        stock-net:\n            ipv4_address: 172.28.1.1\n            \n    kafka:\n        image: bitnami/kafka:latest\n        ports:\n        - \"9092:9092\"\n        environment:\n        - KAFKA_BROKER_ID=1\n        - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092\n        - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://172.28.1.2:9092\n        - KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181\n        - ALLOW_PLAINTEXT_LISTENER=yes\n        depends_on:\n        - zookeeper\n        networks:\n        stock-net:\n            ipv4_address: 172.28.1.2\n        volumes:\n        - ./scripts/init-kafka.sh:/init-kafka.sh\n        # entrypoint: [\"/bin/bash\", \"init-kafka.sh\"]\n        restart: always\n\n    cassandra:\n        image: cassandra:latest\n        ports:\n        - \"9042:9042\"\n        volumes:\n        - ./init-cassandra:/init-cassandra\n        - ./scripts/init-cassandra-schema.sh:/init-cassandra-schema.sh\n        environment:\n        - CASSANDRA_START_RPC=true\n        networks:\n        stock-net:\n            ipv4_address: 172.28.1.3\n        # entrypoint: [\"/bin/bash\", \"init-cassandra-schema.sh\"]\n        restart: always\n\n    spark:\n        image: bitnami/spark:latest\n        volumes:\n        - ./spark:/opt/bitnami/spark/jobs\n        - ./scripts/submit-spark-job.sh:/opt/bitnami/spark/submit-spark-job.sh\n        ports:\n        - \"8080:8080\"\n        depends_on:\n        - kafka\n        networks:\n        stock-net:\n            ipv4_address: 172.28.1.4\n        # entrypoint: [\"sh\", \"-c\", \"./submit-spark-job.sh\"]\n        restart: always\n\n    kafka_producer:\n        build:\n        context: ./kafka-producer\n        dockerfile: kafka_producer.dockerfile\n        depends_on:\n        - kafka\n        networks:\n        stock-net:\n            ipv4_address: 172.28.1.8\n        restart: always\n\n    plotly:\n        build:\n        context: ./plotly\n        dockerfile: plotly.dockerfile\n        volumes:\n        - ./plotly/dashboard.py:/dashboard.py\n        ports:\n        - \"8050:8050\"\n        depends_on:\n        - cassandra\n        networks:\n            stock-net:\n            ipv4_address: 172.28.1.9\n        restart: always\n\n    networks:\n    stock-net:\n        driver: bridge\n        ipam:\n        config:\n            - subnet: 172.28.0.0/16\n   ```\n\n2. **Run Docker Compose**:\n   ```bash\n   docker-compose up -d\n   ```\n\n### Usage\n\n1. **Run the Spark Job**:\n   Use the `spark-submit` command to run your Spark job. \n   ```bash\n   $ spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.1,com.datastax.spark:spark-cassandra-connector_2.12:3.0.0 spark_job.py stocks\n   ```\n\n2. **Produce and Consume Data**:\n   Start producing data to the `stocks` topic and monitor the pipeline's output.\n\n## Monitoring and Logging\n\nCheck the logs for each service in their respective directories for monitoring and debugging.\n\n\n## Visualizations\n\nTo run the dashbaord, you need to run the following command:\n\n```bash\n$ cd plotly \u0026 python3 dashboard.py\n```\n\n![graph 1](./assets/graph1.png)\n\n\n![graph 2](./assets/graph2.png)\n\n\n![graph 3](./assets/graph3.png)\n\n\n![graph 4](./assets/graph4.png)\n\n## Testing\n\n![docker-compose-d](./assets/docker-compose-d.png)\n\n![docker-monitoring](./assets/docker-monitoring.png)\n\n![docker-ps](./assets/docker-ps.png)\n\n![cqlsh](./assets/cqlsh.png)\n\n![stocks-data-before](./assets/stocks-data-before.png)\n\n![creat-kafka-topic](./assets/create-kafka-topic.png)\n\n![kafka-producer](./assets/kafka-producer.png)\n\n![spark-processing-1](./assets/spark-processing-1.png)\n\n![spark-processing-1](./assets/spark-processing-2.png)\n\n![cassandra](./assets/cassandra-data.png)\n\n\n## Tables Results\n\n### Stocks Table\n![stocks](./assets/stocks.png)\n\n### Analysis Stocks Table\n![analytics_stocks](./assets/analytics_stocks.png)\n\n### Analysis Stocks Table\n![grouped_stocks](./assets/grouped_stocks.png)\n\n### Pivoted Stocks Table\n![grouped_stocks](./assets/pivoted_stocks.png)\n\n### Ranked Stocks Table\n![grouped_stocks](./assets/ranked_stocks.png)\n\n### Rollup Stocks Table\n![grouped_stocks](./assets/rollup_stocks.png)\n\n\n## Contributing\n\nContributions to RealTime StockStream are welcome, just open a PR 😊.\n\n## Authors\n\n- [Abdullah 🚀](https://github.com/qahta0)\n- [Abdullah 🚀](https://github.com/AbdullahAlzeid)\n- [Yaarob 🚀](https://github.com/yaarob988)\n\n## License\n\nThis project is licensed under the MIT License.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanqorithm%2Frealtime-stockstream","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanqorithm%2Frealtime-stockstream","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanqorithm%2Frealtime-stockstream/lists"}