https://github.com/pngo1997/learning-to-rank-algorithm

Builds a ranking model to predict the relevance score for query-product pairs in HomeDepot’s product search.
https://github.com/pngo1997/learning-to-rank-algorithm

feature-engineering information-retrieval inverted-index learning-to-rank linear-regression mse neural-network point-wise python r-squared ranking-algorithm support-vector-regression text-processing tf-idf xgboost

Last synced: 2 months ago
JSON representation

Builds a ranking model to predict the relevance score for query-product pairs in HomeDepot’s product search.

Host: GitHub
URL: https://github.com/pngo1997/learning-to-rank-algorithm
Owner: pngo1997
Created: 2024-03-22T19:43:41.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-01-31T19:36:48.000Z (8 months ago)
Last Synced: 2025-02-28T14:13:31.190Z (7 months ago)
Topics: feature-engineering, information-retrieval, inverted-index, learning-to-rank, linear-regression, mse, neural-network, point-wise, python, r-squared, ranking-algorithm, support-vector-regression, text-processing, tf-idf, xgboost
Language: Jupyter Notebook
Homepage:
Size: 19.7 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# 🛍️ HomeDepot Product Search Relevance Prediction

## 📜 Overview
This project builds a **ranking model** to predict the **relevance score** for **query-product pairs** in HomeDepot’s product search. Using **Learning to Rank (LTR)**, we apply a **Pointwise Approach** to train a regression model based on **text similarity features** between the user query and product information.

📌 **Note**:
Product description dataset is very large. Please contact me if you want to use it.

📌 **Dataset**:
- **Train Set (`train_new.csv`)** – Query-product pairs with **ground-truth relevance scores**.
- **Test Set (`test_new.csv`)** – Query-product pairs for **prediction**.
- **Product Descriptions (`product_descriptions_new.csv`)** – Additional product details.
- **Product Attributes (`attributes_new.csv`)** – Additional structured product attributes.

📌 **Goal**:
1. Compute **text similarity** between `search_term` and:
- `product_title`
- `product_description`
- `product_attributes`
2. Generate **feature vectors** for training and testing.
3. Train a **machine learning model** to predict **relevance scores**.
4. Evaluate performance using **Mean Squared Error (MSE) & R² score**.

📌 **Programming Language**: `Python 3`
📌 **Libraries Used**: `pandas`, `scikit-learn`, `nltk`, `numpy`, `scipy`, `XGBoost`

## 🚀 Approach

### **1️⃣ Data Preprocessing**
- **Text Cleaning** (e.g., spelling correction, numerical normalization).
- **Tokenization & Stopword Removal** using `NLTK`.
- **TF-IDF Vectorization** for product details.

### **2️⃣ Feature Engineering**
- Compute **Cosine Similarity** between `search_term` and:
- `product_title`
- `product_description`
- `product_attributes`
- Compute **additional similarity measures** (e.g., **Jaccard, Dice Coefficient, Overlap**).
- Minimum **6 similarity features** for each query-product pair.

### **3️⃣ Model Training & Evaluation**
- Train models using **Supervised Learning Algorithms**:
- **Linear Regression**
- **Support Vector Regressor (SVR)**
- **XGBoost Regressor**
- **Neural Networks**
- Evaluate model performance using:
- **Mean Squared Error (MSE)**
- **R² Score**

### **4️⃣ Predictions on Test Data**
- Generate **predicted relevance scores** for `test_new.csv`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pngo1997/learning-to-rank-algorithm

Awesome Lists containing this project

README