https://github.com/easezyc/Multitask-Recommendation-Library

MTReclib provides a PyTorch implementation of multi-task recommendation models and common datasets.
https://github.com/easezyc/Multitask-Recommendation-Library

advertising ctr-prediction multitask-learning multitask-recommendation recommendation recommender-system transfer-learning

Last synced: 6 months ago
JSON representation

MTReclib provides a PyTorch implementation of multi-task recommendation models and common datasets.

Host: GitHub
URL: https://github.com/easezyc/Multitask-Recommendation-Library
Owner: easezyc
License: mit
Created: 2022-03-06T13:58:54.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2022-11-30T02:54:55.000Z (over 2 years ago)
Last Synced: 2024-05-22T11:33:00.493Z (12 months ago)
Topics: advertising, ctr-prediction, multitask-learning, multitask-recommendation, recommendation, recommender-system, transfer-learning
Language: Python
Homepage:
Size: 46.9 KB
Stars: 263
Watchers: 2
Forks: 42
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-multi-task-learning - MTReclib - task recommendation models and common datasets. (Codebase / Recommendation)
StarryDivineSky - easezyc/Multitask-Recommendation-Library

README

# Multi-task Recommendation in PyTorch
[![MIT License](https://img.shields.io/badge/license-MIT-green.svg)](https://opensource.org/licenses/MIT) [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)

![MTRec](./mtreclib.png)

-------------------------------------------------------------------------------

## Introduction
MTReclib provides a PyTorch implementation of multi-task recommendation models and common datasets. Currently, we implmented 7 multi-task recommendation models to enable fair comparison and boost the development of multi-task recommendation algorithms. The currently supported algorithms include:
* SingleTask：Train one model for each task, respectively
* Shared-Bottom: It is a traditional multi-task model with a shared bottom and multiple towers.
* OMoE: [Adaptive Mixtures of Local Experts](https://ieeexplore.ieee.org/abstract/document/6797059) (Neural Computation 1991)
* MMoE: [Modeling Task Relationships in Multi-task Learning with Multi-Gate Mixture-of-Experts](https://dl.acm.org/doi/pdf/10.1145/3219819.3220007) (KDD 2018)
* PLE: [Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations](https://dl.acm.org/doi/pdf/10.1145/3383313.3412236?casa_token=8fchWD8CHc0AAAAA:2cyP8EwkhIUlSFPRpfCGHahTddki0OEjDxfbUFMkXY5fU0FNtkvRzmYloJtLowFmL1en88FRFY4Q) (RecSys 2020 best paper)
* AITM: [Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning in Targeted Display Advertising](https://dl.acm.org/doi/pdf/10.1145/3447548.3467071?casa_token=5YtVOYjJClUAAAAA:eVczwdynmE9dwoyElCG4da9fC5gsRiyX6zKt0_mIJF1K8NkU-SlNkGmpAu0c0EHbM3hBUe3zZc-o) (KDD 2021)
* MetaHeac: [Learning to Expand Audience via Meta Hybrid Experts and Critics for Recommendation and Advertising](https://easezyc.github.io/data/kdd21_metaheac.pdf) (KDD 2021)

## Datasets
* AliExpressDataset: This is a dataset gathered from real-world traffic logs of the search system in AliExpress. This dataset is collected from 5 countries: Russia, Spain, French, Netherlands, and America, which can utilized as 5 multi-task datasets. [Original_dataset](https://tianchi.aliyun.com/dataset/dataDetail?dataId=74690) [Processed_dataset Google Drive](https://drive.google.com/drive/folders/1F0TqvMJvv-2pIeOKUw9deEtUxyYqXK6Y?usp=sharing) [Processed_dataset Baidu Netdisk](https://pan.baidu.com/s/1AfXoJSshjW-PILXZ6O19FA?pwd=4u0r)

> For the processed dataset, you should directly put the dataset in './data/' and unpack it. For the original dataset, you should put it in './data/' and run 'python preprocess.py --dataset_name NL'.

## Requirements
* Python 3.6
* PyTorch > 1.10
* pandas
* numpy
* tqdm

## Run

Parameter Configuration:

- dataset_name: choose a dataset in ['AliExpress_NL', 'AliExpress_FR', 'AliExpress_ES', 'AliExpress_US'], default for `AliExpress_NL`
- dataset_path: default for `./data`
- model_name: choose a model in ['singletask', 'sharedbottom', 'omoe', 'mmoe', 'ple', 'aitm', 'metaheac'], default for `metaheac`
- epoch: the number of epochs for training, default for `50`
- task_num: the number of tasks, default for `2` (CTR & CVR)
- expert_num: the number of experts for ['omoe', 'mmoe', 'ple', 'metaheac'], default for `8`
- learning_rate: default for `0.001`
- batch_size: default for `2048`
- weight_decay: default for `1e-6`
- device: the device to run the code, default for `cuda:0`
- save_dir: the folder to save parameters, default for `chkpt`

You can run a model through:

```powershell
python main.py --model_name metaheac --num_expert 8 --dataset_name AliExpress_NL
```

## Results
> For fair comparisons, the learning rate is 0.001, the dimension of embeddings is 128, and mini-batch size is 2048 equally for all models. We report the mean AUC and Logloss over five random runs. Best results are in boldface.

Methods
AliExpress (Netherlands, NL)
AliExpress (Spain, ES)

CTR
CTCVR
CTR
CTCVR

AUC
Logloss
AUC
Logloss
AUC
Logloss
AUC
Logloss

SingleTask
0.7222
0.1085
0.8590
0.00609
0.7266
0.1207
0.8855
0.00456

Shared-Bottom
0.7228
0.1083
0.8511
0.00620
0.7287
0.1204
0.8866
0.00452

OMoE
0.7254
0.1081
0.8611
0.00614
0.7253
0.1209
0.8859
0.00452

MMoE
0.7234
0.1080
0.8606
0.00607
0.7285
0.1205
0.8898
0.00450

PLE
0.7292
0.1088
0.8591
0.00631
0.7273
0.1223
0.8913
0.00461

AITM
0.7240
0.1078
0.8577
0.00611
0.7290
0.1203
0.8885
0.00451

MetaHeac
0.7263
0.1077
0.8615
0.00606
0.7299
0.1203
0.8883
0.00450

Methods
AliExpress (French, FR)
AliExpress (America, US)

CTR
CTCVR
CTR
CTCVR

AUC
Logloss
AUC
Logloss
AUC
Logloss
AUC
Logloss

SingleTask
0.7259
0.1002
0.8737
0.00435
0.7061
0.1004
0.8637
0.00381

Shared-Bottom
0.7245
0.1004
0.8700
0.00439
0.7029
0.1008
0.8698
0.00381

OMoE
0.7257
0.1006
0.8781
0.00432
0.7049
0.1007
0.8701
0.00381

MMoE
0.7216
0.1010
0.8811
0.00431
0.7043
0.1006
0.8758
0.00377

PLE
0.7276
0.1014
0.8805
0.00451
0.7138
0.0992
0.8675
0.00403

AITM
0.7236
0.1005
0.8763
0.00431
0.7048
0.1004
0.8730
0.00377

MetaHeac
0.7249
0.1005
0.8813
0.00429
0.7089
0.1001
0.8743
0.00378

## File Structure

```
.
├── main.py
├── README.md
├── models
│   ├── layers.py
│   ├── aitm.py
│   ├── omoe.py
│   ├── mmoe.py
│   ├── metaheac.py
│   ├── ple.py
│   ├── singletask.py
│   └── sharedbottom.py
└── data
├── preprocess.py # Preprocess the original data
├── AliExpress_NL # AliExpressDataset from Netherlands
├── train.csv
└── test.py
├── AliExpress_ES # AliExpressDataset from Spain
├── AliExpress_FR # AliExpressDataset from French
└── AliExpress_US # AliExpressDataset from America
```

## Contact
If you have any problem about this library, please create an issue or send us an Email at:
* [email protected]

## Reference
If you use this repository, please cite the following papers:

```
@inproceedings{zhu2021learning,
title={Learning to Expand Audience via Meta Hybrid Experts and Critics for Recommendation and Advertising},
author={Zhu, Yongchun and Liu, Yudan and Xie, Ruobing and Zhuang, Fuzhen and Hao, Xiaobo and Ge, Kaikai and Zhang, Xu and Lin, Leyu and Cao, Juan},
booktitle={Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining},
pages={4005--4013},
year={2021}
}
```

```
@inproceedings{xi2021modeling,
title={Modeling the sequential dependence among audience multi-step conversions with multi-task learning in targeted display advertising},
author={Xi, Dongbo and Chen, Zhen and Yan, Peng and Zhang, Yinger and Zhu, Yongchun and Zhuang, Fuzhen and Chen, Yu},
booktitle={Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining},
pages={3745--3755},
year={2021}
}
```

Some model implementations and util functions refers to these nice repositories.

- [pytorch-fm](https://github.com/rixwew/pytorch-fm): This package provides a PyTorch implementation of factorization machine models and common datasets in CTR prediction.
- [MetaHeac](https://github.com/easezyc/MetaHeac): This is an official implementation for Learning to Expand Audience via Meta Hybrid Experts and Critics for Recommendation and Advertising.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/easezyc/Multitask-Recommendation-Library

Awesome Lists containing this project

README