https://github.com/sproc01/lfn_amazonbookanalysis
Python notebook to explore the power of local clustering coefficient to approximate the Salesrank
https://github.com/sproc01/lfn_amazonbookanalysis
graph ndcg notebook python
Last synced: 2 months ago
JSON representation
Python notebook to explore the power of local clustering coefficient to approximate the Salesrank
- Host: GitHub
- URL: https://github.com/sproc01/lfn_amazonbookanalysis
- Owner: Sproc01
- License: mit
- Created: 2025-01-09T12:44:49.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-15T21:30:11.000Z (over 1 year ago)
- Last Synced: 2025-03-11T07:33:32.159Z (over 1 year ago)
- Topics: graph, ndcg, notebook, python
- Language: Jupyter Notebook
- Homepage:
- Size: 1.43 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Lfn_AmazonBookAnalysis
The dataset used is available at [Stanford site](http://snap.stanford.edu/data/amazon-meta.html)
The project main idea is explained in [Project Proposal](https://github.com/Sproc01/Lfn_AmazonBookAnalysis/blob/56fd6365e36d19775cea54dac1831de6f2a3db16/Project%20Proposal.pdf)
The project mid-term report is [Mid-Term Report](https://github.com/Sproc01/Lfn_AmazonBookAnalysis/blob/56fd6365e36d19775cea54dac1831de6f2a3db16/Mid-term%20report.pdf)
The project final report is [Report](https://github.com/Sproc01/Lfn_AmazonBookAnalysis/blob/46c2a29f80b57e2c9a206ae2424a58c41a567cda/LfN_Report.pdf)
## Motivation
This project explores the potential of the local clustering coefficient as a feature of popularity for books and genres.
The clustering coefficient gives a strong metric of correlation between books bought together, so it can be used to determine a possible order of the ele- ments, obtained by analysing how often a book appears in a triangle.
Given the values of the local clustering coefficient, we investigated the possibil- ity of approximating the salesrank order for the books. We also compare the order for the different genres that can be obtained using the salesrank and the clustering coefficient of the different books within the genre.
Then we try to classify each book into 4 different categories based on a new joint definition of popularity that we came up with, with the objective of capturing the nature of the books.