https://github.com/anqorithm/realtime-stockstream
RealTime StockStream is a streamlined, simulation system for processing live stock market data. It uses Apache Kafka for data input, Apache Spark for data handling, and Apache Cassandra for data storage, making it a powerful yet easy-to-use tool for financial data analysis
https://github.com/anqorithm/realtime-stockstream
apache-spark apache-sparksql asynchronous bigdata cassendra data-stream-processing databases docker docker-compose kafka python realtime spark spark-master spark-streaming stock-market stocks zookeeper
Last synced: 7 months ago
JSON representation
RealTime StockStream is a streamlined, simulation system for processing live stock market data. It uses Apache Kafka for data input, Apache Spark for data handling, and Apache Cassandra for data storage, making it a powerful yet easy-to-use tool for financial data analysis
- Host: GitHub
- URL: https://github.com/anqorithm/realtime-stockstream
- Owner: anqorithm
- Created: 2023-11-19T12:15:00.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2024-01-26T13:10:02.000Z (over 2 years ago)
- Last Synced: 2025-02-15T18:53:43.022Z (over 1 year ago)
- Topics: apache-spark, apache-sparksql, asynchronous, bigdata, cassendra, data-stream-processing, databases, docker, docker-compose, kafka, python, realtime, spark, spark-master, spark-streaming, stock-market, stocks, zookeeper
- Language: Python
- Homepage:
- Size: 5.36 MB
- Stars: 21
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project
README
# RealTime StockStream
RealTime StockStream is a streamlined system for processing live stock market data. It uses Apache Kafka for data input, Apache Spark for data handling, and Apache Cassandra for data storage, making it a powerful yet easy-to-use tool for financial data analysis 💹🕊️

## Getting Started
This guide will walk you through setting up and running the RealTime StockStream on your local machine for development and testing.
### Prerequisites
Ensure you have the following software installed:
- Docker
- Python (version 3.11 or higher)
### Todo Features
1. **Live Market Data Integration** ⌛
2. **Advanced Analytics Features** ⌛
3. **Interactive Data Visualization** ⌛
4. **Improved Scalability** ⌛
5. **User Customization Options** ⌛
6. **Stronger Security** ⌛
### Used Techs

- Appache Kafka
- Appache Cassandra
- Appache ZooKeeper
- Appache Spark
- Python
### Installation
Follow these steps to set up your development environment:
#### Setting Up Kafka
1. **Create a Kafka Topic**:
```bash
kafka-topics.sh --create --topic stocks --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
```
## Suppored Data Opreations
1. **Grouping Aggregation:** Summarize data by groups.
2. **Pivot Aggregation:** Reshape data, converting rows to columns.
3. **Rollups and Cubes:** Perform hierarchical and combinational aggregations.
4. **Ranking Functions:** Assign ranks within data partitions.
5. **Analytic Functions:** Compute aggregates while maintaining row-level details.
## Database Schema

#### Configuring Cassandra
1. **Create a Keyspace and Table**:
Execute the following CQL commands to set up your Cassandra database:
```sql
CREATE KEYSPACE stockdata WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 1};
CREATE TABLE stockdata.stocks (
stock text,
trade_id uuid,
price decimal,
quantity int,
trade_type text,
trade_date date,
trade_time time,
PRIMARY KEY (stock, trade_id)
);
```
## System Architecture

#### Docker Compose
1. **Launch Services**:
Use Docker Compose to start Kafka, Zookeeper, Cassandra, and Spark services:
```yaml
version: '3.9'
name: "realtime-stock-market"
services:
zookeeper:
image: bitnami/zookeeper:latest
ports:
- "2181:2181"
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
networks:
stock-net:
ipv4_address: 172.28.1.1
kafka:
image: bitnami/kafka:latest
ports:
- "9092:9092"
environment:
- KAFKA_BROKER_ID=1
- KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
- KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://172.28.1.2:9092
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- ALLOW_PLAINTEXT_LISTENER=yes
depends_on:
- zookeeper
networks:
stock-net:
ipv4_address: 172.28.1.2
volumes:
- ./scripts/init-kafka.sh:/init-kafka.sh
# entrypoint: ["/bin/bash", "init-kafka.sh"]
restart: always
cassandra:
image: cassandra:latest
ports:
- "9042:9042"
volumes:
- ./init-cassandra:/init-cassandra
- ./scripts/init-cassandra-schema.sh:/init-cassandra-schema.sh
environment:
- CASSANDRA_START_RPC=true
networks:
stock-net:
ipv4_address: 172.28.1.3
# entrypoint: ["/bin/bash", "init-cassandra-schema.sh"]
restart: always
spark:
image: bitnami/spark:latest
volumes:
- ./spark:/opt/bitnami/spark/jobs
- ./scripts/submit-spark-job.sh:/opt/bitnami/spark/submit-spark-job.sh
ports:
- "8080:8080"
depends_on:
- kafka
networks:
stock-net:
ipv4_address: 172.28.1.4
# entrypoint: ["sh", "-c", "./submit-spark-job.sh"]
restart: always
kafka_producer:
build:
context: ./kafka-producer
dockerfile: kafka_producer.dockerfile
depends_on:
- kafka
networks:
stock-net:
ipv4_address: 172.28.1.8
restart: always
plotly:
build:
context: ./plotly
dockerfile: plotly.dockerfile
volumes:
- ./plotly/dashboard.py:/dashboard.py
ports:
- "8050:8050"
depends_on:
- cassandra
networks:
stock-net:
ipv4_address: 172.28.1.9
restart: always
networks:
stock-net:
driver: bridge
ipam:
config:
- subnet: 172.28.0.0/16
```
2. **Run Docker Compose**:
```bash
docker-compose up -d
```
### Usage
1. **Run the Spark Job**:
Use the `spark-submit` command to run your Spark job.
```bash
$ spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.1,com.datastax.spark:spark-cassandra-connector_2.12:3.0.0 spark_job.py stocks
```
2. **Produce and Consume Data**:
Start producing data to the `stocks` topic and monitor the pipeline's output.
## Monitoring and Logging
Check the logs for each service in their respective directories for monitoring and debugging.
## Visualizations
To run the dashbaord, you need to run the following command:
```bash
$ cd plotly & python3 dashboard.py
```




## Testing










## Tables Results
### Stocks Table

### Analysis Stocks Table

### Analysis Stocks Table

### Pivoted Stocks Table

### Ranked Stocks Table

### Rollup Stocks Table

## Contributing
Contributions to RealTime StockStream are welcome, just open a PR 😊.
## Authors
- [Abdullah 🚀](https://github.com/qahta0)
- [Abdullah 🚀](https://github.com/AbdullahAlzeid)
- [Yaarob 🚀](https://github.com/yaarob988)
## License
This project is licensed under the MIT License.