Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rickymiura/slack-posts-eda
In this repository, I perform EDA on a large dataset of Slack posts using Apache Spark and AWS to efficiently uncover trends and insights at scale.
https://github.com/rickymiura/slack-posts-eda
big-data distributed-computing spark
Last synced: 18 days ago
JSON representation
In this repository, I perform EDA on a large dataset of Slack posts using Apache Spark and AWS to efficiently uncover trends and insights at scale.
- Host: GitHub
- URL: https://github.com/rickymiura/slack-posts-eda
- Owner: RickyMiura
- Created: 2025-01-26T08:18:13.000Z (20 days ago)
- Default Branch: main
- Last Pushed: 2025-01-26T08:25:54.000Z (20 days ago)
- Last Synced: 2025-01-26T09:23:36.929Z (20 days ago)
- Topics: big-data, distributed-computing, spark
- Language: Jupyter Notebook
- Homepage:
- Size: 447 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Overview
This project focuses on performing exploratory data analysis (EDA) on a large dataset of Slack posts using **Apache Spark** and **AWS**. The primary objective was to handle and analyze the big dataset efficiently, uncovering trends, patterns, and insights from Slack messages at scale.
## Key Highlights:
- **Big Data Processing**: Leveraged Spark for distributed data processing, enabling efficient handling of large Slack datasets.
- **Cloud Integration**: Utilized AWS services for data storage, processing, and scaling the analysis infrastructure.
- **Insights and Trends**: Explored key metrics such as message frequency, user activity patterns, and common topics across Slack posts.This project demonstrates the power of combining **big data tools** like Spark with **cloud computing** to analyze large datasets and generate actionable insights.
# Contributors
1. Ricky Miura
2. Gopi Maguluri
3. Nihal Karim
4. Pooja Baralu Umesh