{"id":13958342,"url":"https://github.com/oywtece/deepmcp","last_synced_at":"2025-07-20T23:31:01.108Z","repository":{"id":186945881,"uuid":"187176781","full_name":"oywtece/deepmcp","owner":"oywtece","description":"Deep Matching, Correlation and Prediction (DeepMCP) Model","archived":false,"fork":false,"pushed_at":"2020-07-22T08:17:24.000Z","size":34,"stargazers_count":125,"open_issues_count":2,"forks_count":39,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-08-09T13:18:34.930Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oywtece.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-05-17T08:21:56.000Z","updated_at":"2024-08-05T06:43:14.000Z","dependencies_parsed_at":"2023-08-08T11:27:35.649Z","dependency_job_id":null,"html_url":"https://github.com/oywtece/deepmcp","commit_stats":null,"previous_names":["oywtece/deepmcp"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oywtece%2Fdeepmcp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oywtece%2Fdeepmcp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oywtece%2Fdeepmcp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oywtece%2Fdeepmcp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oywtece","download_url":"https://codeload.github.com/oywtece/deepmcp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226844979,"owners_count":17691138,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-08T13:01:29.969Z","updated_at":"2024-11-28T01:31:43.090Z","avatar_url":"https://github.com/oywtece.png","language":"Python","funding_links":[],"categories":["其他_推荐系统"],"sub_categories":["网络服务_其他"],"readme":"# Deep Matching, Correlation and Prediction (DeepMCP) Model\n\nDeepMCP is a model for click-through rate (CTR) prediction. Most existing methods mainly model the feature-CTR relationship and suffer from the data sparsity issue. In contrast, DeepMCP models other types of relationships in order to learn more informative and statistically reliable feature representations, and in consequence to improve the performance of CTR prediction. In particular, DeepMCP contains three parts: a matching subnet, a correlation subnet and a prediction subnet. These subnets model the user-ad, ad-ad and feature-CTR relationship respectively. When these subnets are jointly optimized under the supervision of the target labels, the learned feature representations have both good prediction powers and good representation abilities. \n\nIf you use this code, please cite the following paper:\n* **Representation Learning-Assisted Click-Through Rate Prediction. In IJCAI, 2019.**\n\narXiv: https://arxiv.org/abs/1906.04365 [Extended version]\n\nIJCAI: https://www.ijcai.org/proceedings/2019/634\n\n#### Bibtex\n```\n@inproceedings{ouyang2019representation,\n  title={Representation Learning-Assisted Click-Through Rate Prediction},\n  author={Ouyang, Wentao and Zhang, Xiuwu and Ren, Shukui and Qi, Chao and Liu, Zhaojie and Du, Yanlong},\n  booktitle={IJCAI},\n  pages={4561--4567},\n  year={2019}\n}\n```\n\n#### TensorFlow (TF) version\n1.3.0\n\n#### Abbreviation\nft - feature, slot == field\n\n## Data Preparation (DeepMP)\nData is in the \"csv\" format, where each row contains an instance.\\\nAssume there are N unique fts. Fts need to be indexed from 1 to N. Use 0 for missing values or for padding.\n\nWe categorize fts as i) **one-hot** or **univalent** (e.g., user id, city) and ii) **mul-hot** or **multivalent** (e.g., words in ad title).\n\ncsv data format\n* \\\u003clabel\\\u003e\\\u003cone-hot fts\\\u003e\\\u003cmul-hot fts\\\u003e\n\nWe also need to define the max number of features per mul-hot ft slot (through the \"max_len_per_slot\" parameter) and perform trimming or padding accordingly. Please refer to the following example for more detail.\n\n### Example\n1. original fts (ft_name:ft_value)\n* label:0, gender:male, age:27, query:apple, title:apple, title:fruit, title:fresh\n* label:1, gender:female, age:35, query:shoes, query:winter, title:shoes, title:winter, title:warm, title:sales\n\n2. csv fts (not converted to ft index yet)\n* 0, male, 27, apple, 0, 0, apple, fruit, fresh\n* 1, female, 35, shoes, winter, 0, shoes, winter, warm\n\n#### Explanation\ncsv format settings:\\\nn_one_hot_slot = 2 # num of one-hot ft slots (gender, age)\\\nn_mul_hot_slot = 2 # num of mul-hot ft slots (query, title)\\\nmax_len_per_slot = 3 # max num of fts per mul-hot ft slot\n\nFor the first instance, the mul-hot ft slot \"query\" contains only 1 ft \"apple\". We thus pad (max_len_per_slot - 1) zeros, resulting in \"apple, 0, 0\".\\\nFor the second instance, the mul-hot ft slot \"title\" contains 4 fts. We thus only keep the first max_len_per_slot fts.\n\n## Data Preparation (DeepCP/DeepMCP)\nDeepCP/DeepMCP needs two datasets as input. Both are in the \"csv\" format.\\\nThe first dataset is the same as that for DeepMP.\\\nThe second dataset should contain a target ad, a context ad and N negative ads per row.\n\ncsv data format\n* \\\u003ctarget one-hot fts\\\u003e\\\u003ctarget mul-hot fts\\\u003e\\\u003cctxt one-hot fts\\\u003e\\\u003cctxt mul-hot fts\\\u003e\\\u003cneg1 one-hot fts\\\u003e\\\u003cneg1 mul-hot fts\\\u003e...\\\u003cnegN one-hot fts\\\u003e\\\u003cnegN mul-hot fts\\\u003e\n\ncsv format settings:\\\nn_one_hot_slot_s = 2 # num of one-hot ft slots per ad in the second dataset\\\nn_mul_hot_slot_s = 2 # num of mul-hot ft slots per ad in the second dataset\\\nmax_len_per_slot_s = 3 # max num of fts per mul-hot ft slot in the second dataset\n\n## Source Code\n1. **DeepMP** achieves the best tradeoff between prediction performance and model complexity. It needs only 1 dataset. (configs of the second dataset are useless) \\[**_Recommended_**\\]\n2. DeepCP needs 2 datasets. Its performance is not as good as DeepMP.\n3. DeepMCP also needs 2 datasets. It is the most complex and leads to the best performance.\n\n* config_deepmcp.py -- config file\n* ctr_funcs.py -- functions\n* deepmp.py -- Deep Matching and Prediction (DeepMP) model\n* deepcp.py -- Deep Correlation and Prediction (DeepCP) model\n* deepmcp.py -- Deep Matching, Correlation and Prediction (DeepMCP) model\n\n## Run the Code\nFirst revise the config file, and then run the code\n```bash\nnohup python deepmp.py \u003e [output_file_name] 2\u003e\u00261 \u0026\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foywtece%2Fdeepmcp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foywtece%2Fdeepmcp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foywtece%2Fdeepmcp/lists"}