Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/oywtece/deepmcp
Deep Matching, Correlation and Prediction (DeepMCP) Model
https://github.com/oywtece/deepmcp
Last synced: about 2 months ago
JSON representation
Deep Matching, Correlation and Prediction (DeepMCP) Model
- Host: GitHub
- URL: https://github.com/oywtece/deepmcp
- Owner: oywtece
- License: mit
- Created: 2019-05-17T08:21:56.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-07-22T08:17:24.000Z (over 4 years ago)
- Last Synced: 2024-08-09T13:18:34.930Z (5 months ago)
- Language: Python
- Size: 33.2 KB
- Stars: 125
- Watchers: 6
- Forks: 39
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - oywtece/deepmcp - CTR 关系进行建模,并且存在数据稀疏问题。相比之下,DeepMCP 对其他类型的关系进行建模,以学习更多信息和统计上可靠的特征表示,从而提高 CTR 预测的性能。DeepMCP 包含三部分:匹配子网、关联子网和预测子网。这些子网分别为用户-广告、广告-广告和功能-点击率关系建模。当这些子网在目标标签的监督下联合优化时,学习到的特征表示既具有良好的预测能力,又具有良好的表示能力。 (其他_推荐系统 / 网络服务_其他)
README
# Deep Matching, Correlation and Prediction (DeepMCP) Model
DeepMCP is a model for click-through rate (CTR) prediction. Most existing methods mainly model the feature-CTR relationship and suffer from the data sparsity issue. In contrast, DeepMCP models other types of relationships in order to learn more informative and statistically reliable feature representations, and in consequence to improve the performance of CTR prediction. In particular, DeepMCP contains three parts: a matching subnet, a correlation subnet and a prediction subnet. These subnets model the user-ad, ad-ad and feature-CTR relationship respectively. When these subnets are jointly optimized under the supervision of the target labels, the learned feature representations have both good prediction powers and good representation abilities.
If you use this code, please cite the following paper:
* **Representation Learning-Assisted Click-Through Rate Prediction. In IJCAI, 2019.**arXiv: https://arxiv.org/abs/1906.04365 [Extended version]
IJCAI: https://www.ijcai.org/proceedings/2019/634
#### Bibtex
```
@inproceedings{ouyang2019representation,
title={Representation Learning-Assisted Click-Through Rate Prediction},
author={Ouyang, Wentao and Zhang, Xiuwu and Ren, Shukui and Qi, Chao and Liu, Zhaojie and Du, Yanlong},
booktitle={IJCAI},
pages={4561--4567},
year={2019}
}
```#### TensorFlow (TF) version
1.3.0#### Abbreviation
ft - feature, slot == field## Data Preparation (DeepMP)
Data is in the "csv" format, where each row contains an instance.\
Assume there are N unique fts. Fts need to be indexed from 1 to N. Use 0 for missing values or for padding.We categorize fts as i) **one-hot** or **univalent** (e.g., user id, city) and ii) **mul-hot** or **multivalent** (e.g., words in ad title).
csv data format
* \\\We also need to define the max number of features per mul-hot ft slot (through the "max_len_per_slot" parameter) and perform trimming or padding accordingly. Please refer to the following example for more detail.
### Example
1. original fts (ft_name:ft_value)
* label:0, gender:male, age:27, query:apple, title:apple, title:fruit, title:fresh
* label:1, gender:female, age:35, query:shoes, query:winter, title:shoes, title:winter, title:warm, title:sales2. csv fts (not converted to ft index yet)
* 0, male, 27, apple, 0, 0, apple, fruit, fresh
* 1, female, 35, shoes, winter, 0, shoes, winter, warm#### Explanation
csv format settings:\
n_one_hot_slot = 2 # num of one-hot ft slots (gender, age)\
n_mul_hot_slot = 2 # num of mul-hot ft slots (query, title)\
max_len_per_slot = 3 # max num of fts per mul-hot ft slotFor the first instance, the mul-hot ft slot "query" contains only 1 ft "apple". We thus pad (max_len_per_slot - 1) zeros, resulting in "apple, 0, 0".\
For the second instance, the mul-hot ft slot "title" contains 4 fts. We thus only keep the first max_len_per_slot fts.## Data Preparation (DeepCP/DeepMCP)
DeepCP/DeepMCP needs two datasets as input. Both are in the "csv" format.\
The first dataset is the same as that for DeepMP.\
The second dataset should contain a target ad, a context ad and N negative ads per row.csv data format
* \\\\\\...\\csv format settings:\
n_one_hot_slot_s = 2 # num of one-hot ft slots per ad in the second dataset\
n_mul_hot_slot_s = 2 # num of mul-hot ft slots per ad in the second dataset\
max_len_per_slot_s = 3 # max num of fts per mul-hot ft slot in the second dataset## Source Code
1. **DeepMP** achieves the best tradeoff between prediction performance and model complexity. It needs only 1 dataset. (configs of the second dataset are useless) \[**_Recommended_**\]
2. DeepCP needs 2 datasets. Its performance is not as good as DeepMP.
3. DeepMCP also needs 2 datasets. It is the most complex and leads to the best performance.* config_deepmcp.py -- config file
* ctr_funcs.py -- functions
* deepmp.py -- Deep Matching and Prediction (DeepMP) model
* deepcp.py -- Deep Correlation and Prediction (DeepCP) model
* deepmcp.py -- Deep Matching, Correlation and Prediction (DeepMCP) model## Run the Code
First revise the config file, and then run the code
```bash
nohup python deepmp.py > [output_file_name] 2>&1 &
```