https://github.com/dipeshdimi/data-science-systems-project
https://github.com/dipeshdimi/data-science-systems-project
Last synced: 30 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/dipeshdimi/data-science-systems-project
- Owner: dipeshdimi
- Created: 2023-10-14T14:42:39.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-14T15:57:29.000Z (over 2 years ago)
- Last Synced: 2025-03-21T20:46:37.697Z (about 1 year ago)
- Language: Scala
- Size: 9.23 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data-Science-Systems-Project
# Team Members Details :-
1. Dipesh Mishra (20bcs043)
2. Mahendra Singh Puniya (20bcs082)
3. Ravikant (20bcs110)
4. Rishabh Gautam (20bcs112)
# How to use this Project
To use this project, follow these steps :-
1. Data Preparation: Ensure you have the Amazon Product Co-Purchasing Network dataset ready. You can download it from https://snap.stanford.edu/data/amazon0302.txt.gz.
2. Environment Setup: Set up your environment with Apache Spark (version 3.5.0), Scala (version 2.11.12), and OpenJDK.
3. Code Execution: Run the provided code. It will load the dataset, identify connected components, calculate their sizes, sort them, and display the results.
4. Interpret Results: Review the printed results in the console. These results will provide insights into the co-purchasing behavior patterns and clusters within the Amazon Co-Purchasing Network.
Remember to update the location of the downloaded dataset in the program file while using.