Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ineerav/eda-spark-elasticsearch
https://github.com/ineerav/eda-spark-elasticsearch
Last synced: 1 day ago
JSON representation
- Host: GitHub
- URL: https://github.com/ineerav/eda-spark-elasticsearch
- Owner: INeerav
- Created: 2024-03-27T14:26:24.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-03-27T14:46:59.000Z (8 months ago)
- Last Synced: 2024-03-27T16:08:56.433Z (8 months ago)
- Language: Jupyter Notebook
- Size: 4.19 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Analytics Dashboard
This is to create the data analytics dashboard using elasticsearch, performing EDA thorugh pyspark, hadoop, hive and AWS glue ETL## Getting started
Let me clearly define the yardsticks, for me not to deviate much;
- Create the data analystics project with either web or the mobile interface
- Big data, have to satisfy at least one V our of 5 Vs (Volume, velocity, veracity, value & variety)
- tech stack should have at least one of these mentioned tools
- Map reduce, spark, storm, hive, pig, flink etc.## Project status
Current status : web interface, using fastAPI (yet to be done)
for more info on the steps, please follow steps.pdf## Tech stack
1. Cloud service provider : AWS, GCP
1. AWS for EMR, ETL operations
2. GCP hosted elasticsearch cloud, final data visulisations
2. Pre-processing : AWS Athena
3. Processing & ETL : pyspark, hive, glue
4. File system : S3, GCP bucket, HDFS
5. Data storage : hive for big data storage and elsticsearch index for final dashbaord
6. Visualisation : Kibana Elastic Cloud Dashboard
7. BE API : python![Screenshot](final_dashboard.png)