Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/rickymiura/slack-posts-eda

In this repository, I perform EDA on a large dataset of Slack posts using Apache Spark and AWS to efficiently uncover trends and insights at scale.
https://github.com/rickymiura/slack-posts-eda

big-data distributed-computing spark

Last synced: 18 days ago
JSON representation

In this repository, I perform EDA on a large dataset of Slack posts using Apache Spark and AWS to efficiently uncover trends and insights at scale.

Awesome Lists containing this project

README

        

# Overview

This project focuses on performing exploratory data analysis (EDA) on a large dataset of Slack posts using **Apache Spark** and **AWS**. The primary objective was to handle and analyze the big dataset efficiently, uncovering trends, patterns, and insights from Slack messages at scale.

## Key Highlights:
- **Big Data Processing**: Leveraged Spark for distributed data processing, enabling efficient handling of large Slack datasets.
- **Cloud Integration**: Utilized AWS services for data storage, processing, and scaling the analysis infrastructure.
- **Insights and Trends**: Explored key metrics such as message frequency, user activity patterns, and common topics across Slack posts.

This project demonstrates the power of combining **big data tools** like Spark with **cloud computing** to analyze large datasets and generate actionable insights.

# Contributors
1. Ricky Miura
2. Gopi Maguluri
3. Nihal Karim
4. Pooja Baralu Umesh