{"id":25869838,"url":"https://github.com/snehadharne/stockanalyticswithaws","last_synced_at":"2026-04-17T05:02:34.946Z","repository":{"id":244278553,"uuid":"806908504","full_name":"SnehaDharne/StockAnalyticswithAWS","owner":"SnehaDharne","description":"Capstone Project with Chubb. ","archived":false,"fork":false,"pushed_at":"2025-01-11T14:29:59.000Z","size":3313,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-03-02T05:29:03.870Z","etag":null,"topics":["aws","kafka","pyspark","stock-data","streamlit","yfinance-api"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SnehaDharne.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-28T06:18:08.000Z","updated_at":"2025-02-05T22:23:40.000Z","dependencies_parsed_at":"2024-06-13T20:29:31.665Z","dependency_job_id":"11f639a5-5e6d-479f-9302-e007f228d80c","html_url":"https://github.com/SnehaDharne/StockAnalyticswithAWS","commit_stats":null,"previous_names":["snehadharne/stock-analytics-aws","snehadharne/stockanalyticswithaws"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SnehaDharne/StockAnalyticswithAWS","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SnehaDharne%2FStockAnalyticswithAWS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SnehaDharne%2FStockAnalyticswithAWS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SnehaDharne%2FStockAnalyticswithAWS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SnehaDharne%2FStockAnalyticswithAWS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SnehaDharne","download_url":"https://codeload.github.com/SnehaDharne/StockAnalyticswithAWS/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SnehaDharne%2FStockAnalyticswithAWS/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278245369,"owners_count":25955013,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-03T02:00:06.070Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","kafka","pyspark","stock-data","streamlit","yfinance-api"],"created_at":"2025-03-02T05:26:56.142Z","updated_at":"2025-10-03T23:41:52.219Z","avatar_url":"https://github.com/SnehaDharne.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Analytics on Stock Market Data Live Stream using AWS\n\n![image](https://github.com/user-attachments/assets/3dc5010a-defa-4445-b38c-423d2301ab0d)\n\n### Data \n\n Live Market Data:\n 1. During Market Hours\n - Quote Data for AAPL\n - Bar Data for AAPL, MSFT, GOOG, SPY, META, DIA, TSLA\n 2. Outside Market Hours \n - Quote Data for FAKEPACA\n - Bar Data for FAKEPACA \n\nWebsocket : \n- wss://stream.data.alpaca.markets/v2/iex\n- wss://stream.data.alpaca.markets/v2/test\n\n### AWS\n\n1. Create EMR Cluster\n- Configure Cluster\n- Connect to Primary node using ssh\n\n\n2. Edit EC2 Security Groups - give access to local machine's ip\n\n\n3. SSH EMR Cluster on local machine\n- Navigate to the aws-key.pem directory\n- ssh into the emr cluster\n\n\n4. Set up kafka\n- pip install kafka-python\n- pip install boto3\n- pip install websocket-client streamlit watchdog plotly\n- \u2028wget https://downloads.apache.org/kafka/3.5.2/kafka_2.13-3.5.2.tgz\n- tar -xzf kafka_2.13-3.5.2.tgz\n- cd kafka_2.13-3.5.2\n- nano config/server.properties\n- look for advertized listeners and change it to ec2's host ip for master node [ ec2's host ip for master node - ip-10-x-x.ec2.internal ]\n- look for zookeeper connect and change it to ec2's host ip for master node\n- bin/kafka-server-start.sh config/server.properties\n\n\n5. Set up Producer.py\n- vim producer.py\n- update kafka-bootstrap-server with [ip-10-x-x.ec2.internal]\n- bin/kafka-topics.sh --create --bootstrap-server ip-10-0-x-x.ec2.internal:9092 --replication-factor 1 --partitions 1 --topic symbol_topic\n- bin/kafka-topics.sh --create --bootstrap-server ip-10-0-x-x.ec2.internal:9092 --replication-factor 1 --partitions 1 --topic symbol_topic2\n- spark-submit producer.py\n\n\n\n6. Set up Consumer.py\n- vim consumer.py\n- update kafka-bootstrap-server with [ip-10-x-x.ec2.internal]\n- bin/kafka-topics.sh --create --bootstrap-server ip-10-0-x-x.ec2.internal:9092 --replication-factor 1 --partitions 1 --topic visual_topic\n- bin/kafka-topics.sh --create --bootstrap-server ip-10-0-x-x.ec2.internal:9092 --replication-factor 1 --partitions 1 --topic visual_topic2\n- mkdir -p /home/hadoop/consumer1\n- nano /home/hadoop/consumer1/log4j.properties\n- spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1 \\\n             --conf \"spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/home/hadoop/consumer1/log4j.properties\" \\\n             --conf \"spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/home/hadoop/consumer1/log4j.properties\" \\\n             consumer1.py\n- vim consumer2.py\n- update kafka-bootstrap-server with [ip-10-x-x.ec2.internal]\n- spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1 consumer2.py\n\n7. Set up Visualizer (Viz.py)\n- ~/.local/bin/streamlit run viz.py --server.port 8501 --server.address 0.0.0.0\n- ec2-\u003e Security Groups -\u003e Edit Inbound Rules -\u003e add 8501, 0.0.0.0/0 -\u003eSave rules\n\n\n8. Set up Log Visualizer (log_viz.py)\n-  sudo sysctl fs.inotify.max_user_watches=524288\n- ~/.local/bin/streamlit run log_viz.py --server.port 8502 --server.address 0.0.0.0\n- ec2-\u003e Security Groups -\u003e Edit Inbound Rules -\u003e add 8502, 0.0.0.0/0 -\u003eSave rules\n\n\n9. Ganglia (data monitoring)\n    - look for ganglia application under emr, enable ssh configuration \n    - ssh tunneling to port 8050 [ ssh -i ./aws-master-node.pem -ND 8050 hadoop@ec2-44-201-28-163.compute-1.amazonaws.com]\n    - copy uri into a new tab\n    - create a SOCKS5 proxy on the browser\n    - proxy onto SOCKS5\n    - reload\n\nDemo:\nhttps://stevens.zoom.us/rec/share/y5wKK1-hd9FhHAZ8MyxMaIcaGh0pxRAR-ZCoWAw1HWT0OsA-6SaX33J9n9Lo3TFr.p_vYE-iL_XVfAdoA\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnehadharne%2Fstockanalyticswithaws","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsnehadharne%2Fstockanalyticswithaws","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnehadharne%2Fstockanalyticswithaws/lists"}