An open API service indexing awesome lists of open source software.

https://github.com/dipeshdimi/data-science-systems-project


https://github.com/dipeshdimi/data-science-systems-project

Last synced: 30 days ago
JSON representation

Awesome Lists containing this project

README

          

# Data-Science-Systems-Project
# Team Members Details :-
1. Dipesh Mishra (20bcs043)
2. Mahendra Singh Puniya (20bcs082)
3. Ravikant (20bcs110)
4. Rishabh Gautam (20bcs112)

# How to use this Project
To use this project, follow these steps :-
1. Data Preparation: Ensure you have the Amazon Product Co-Purchasing Network dataset ready. You can download it from https://snap.stanford.edu/data/amazon0302.txt.gz.

2. Environment Setup: Set up your environment with Apache Spark (version 3.5.0), Scala (version 2.11.12), and OpenJDK.

3. Code Execution: Run the provided code. It will load the dataset, identify connected components, calculate their sizes, sort them, and display the results.

4. Interpret Results: Review the printed results in the console. These results will provide insights into the co-purchasing behavior patterns and clusters within the Amazon Co-Purchasing Network.

Remember to update the location of the downloaded dataset in the program file while using.