Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ineerav/eda-spark-elasticsearch

Data analysis using pyspark, spark streaming, apache hive, AWS Elastic MapReduce cluster and elasticsearch dashboard hosting with Google Cloud Storage service connectors
https://github.com/ineerav/eda-spark-elasticsearch

aws aws-glue cloudformation elasticsearch elasticsearch-client emr ethena gcp hive pig python spark spark-streaming

Last synced: 25 days ago
JSON representation

Data analysis using pyspark, spark streaming, apache hive, AWS Elastic MapReduce cluster and elasticsearch dashboard hosting with Google Cloud Storage service connectors

Awesome Lists containing this project

README

        

# Data Analytics Dashboard
This is to create the data analytics dashboard using elasticsearch, performing EDA thorugh pyspark, hadoop, hive and AWS glue ETL

## Getting started
Let me clearly define the yardsticks, for me not to deviate much;
- Create the data analystics project with either web or the mobile interface
- Big data, have to satisfy at least one V our of 5 Vs (Volume, velocity, veracity, value & variety)
- tech stack should have at least one of these mentioned tools
- Map reduce, spark, storm, hive, pig, flink etc.

## Project status
Current status : web interface, using fastAPI (yet to be done)
for more info on the steps, please follow steps.pdf

## Tech stack
1. Cloud service provider : AWS, GCP
1. AWS for EMR, ETL operations
2. GCP hosted elasticsearch cloud, final data visulisations
2. Pre-processing : AWS Athena
3. Processing & ETL : pyspark, hive, glue
4. File system : S3, GCP bucket, HDFS
5. Data storage : hive for big data storage and elsticsearch index for final dashbaord
6. Visualisation : Kibana Elastic Cloud Dashboard
7. BE API : python

![Screenshot](final_dashboard.png)