https://github.com/djo/query-context-aware-learning-to-rank
https://github.com/djo/query-context-aware-learning-to-rank
Last synced: 10 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/djo/query-context-aware-learning-to-rank
- Owner: djo
- Created: 2025-04-08T15:02:46.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-09-05T09:18:08.000Z (10 months ago)
- Last Synced: 2025-09-05T11:24:15.103Z (10 months ago)
- Language: Jupyter Notebook
- Size: 74.2 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Query Context-Aware Sequential Ranking
This project, implemented by the authors for the paper [Efficient and Effective Query Context-Aware Learning-to-Rank Model for Sequential Recommendation](https://arxiv.org/abs/2507.03789), provides the implementation and evaluation of a query-context-aware learning-to-rank model for sequential recommendation. The model enhances recommendation accuracy by incorporating the user's target query context into the ranking with the CLM modeling.
## Folder structure
```
├── datasets/ # Contains preprocessing notebooks
│ ├── {preprocessing notebooks for Taobao and RetailRocket}
├── files/ # Stores processed dataset files
├── model.py # Model definition
├── train_model.py # Script for model training
└── README.md # Documentation
```
## Data
The notebooks in the `datasets` directory contain code to automatically download and preprocess the datasets.
After running these notebooks, the processed training and test data will be saved as Parquet files in the `files` directory.
## Training procedure
The entry point to start training the model is the `train_model.py`.
Parameters:
* `--dataset`: either `taobao` or `retailrocket`, to select the respective dataset
* `--train_path`: path to the local folder, storing the training datasets
* `--test_path`: path to the local folder, storing the training datasets
* `--output_data_path`: path for saving the resulting model artifact
* `--integration`: one of the following options (default: NO_QUERY_CONTEXT):
- `NO_QUERY_CONTEXT` for no query context (the respective input will be ignored, and only the items sequence will be used)
- `OUTSIDE` for query context outside the transformer blocks
- `IN_INPUT` for implementation where query context is concatenated with the preceding item representation in the input
- `LAST_LAYER_AND_OUTSIDE` for query context included in the last layer's query position and outside the transformer blocks
* `--num_epochs`: number of training epochs (default: 30)
* `--num_samples`: number of samples to use from dataset; use -1 for all samples (default: -1)
* `--seq_len`: maximum sequence length for input data (default: 100)
* `--batch_size`: batch size for training (default: 128)
* `--past_query_context_in_test`: whether to use past query context information during evaluation (0=only the current context is used, 1=historical context is used as well, default: 0)
* `--query_context_dropout_rate`: dropout rate for query context information; applicable only for `IN_INPUT` integration type (default: 0.0)
* `--query_context_dropout_in_train`: whether to apply query context dropout during training; applicable only for `IN_INPUT` integration typ (0=disabled, 1=enabled, default: 0)
## Usage
Below are example commands for training the model with different integration types and options. Replace paths as needed.
**Quick experiment with a small subset:**
```bash
python train_model.py --dataset taobao --train_path datasets/files --test_path datasets/files --output_data_path ./model_output --integration IN_INPUT --num_epochs 2 --num_samples 10000 --batch_size 32
```
**No query context (NO_QUERY_CONTEXT):**
```bash
python train_model.py --dataset taobao --train_path datasets/files --test_path datasets/files --output_data_path ./model_output --integration NO_QUERY_CONTEXT
```
**Query context outside transformer blocks (OUTSIDE):**
```bash
python train_model.py --dataset taobao --train_path datasets/files --test_path datasets/files --output_data_path ./model_output --integration OUTSIDE
```
**Query context concatenated in input (IN_INPUT):**
```bash
python train_model.py --dataset taobao --train_path datasets/files --test_path datasets/files --output_data_path ./model_output --integration IN_INPUT
```
**IN_INPUT with query context dropout (rate 0.25, enabled during training):**
```bash
python train_model.py --dataset taobao --train_path datasets/files --test_path datasets/files --output_data_path ./model_output --integration IN_INPUT --query_context_dropout_rate 0.25 --query_context_dropout_in_train 1
```
**IN_INPUT using historical query context during evaluation:**
```bash
python train_model.py --dataset taobao --train_path datasets/files --test_path datasets/files --output_data_path ./model_output --integration IN_INPUT --past_query_context_in_test 1
```
**Query context in last layer and outside transformer blocks (LAST_LAYER_AND_OUTSIDE):**
```bash
python train_model.py --dataset taobao --train_path datasets/files --test_path datasets/files --output_data_path ./model_output --integration LAST_LAYER_AND_OUTSIDE
```