{"id":19020721,"url":"https://github.com/librauee/wbdc","last_synced_at":"2025-04-23T07:04:46.556Z","repository":{"id":107435267,"uuid":"394586911","full_name":"librauee/WBDC","owner":"librauee","description":"2021 微信大数据挑战赛 复赛Rank23","archived":false,"fork":false,"pushed_at":"2021-08-13T14:03:47.000Z","size":21,"stargazers_count":27,"open_issues_count":0,"forks_count":4,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-29T22:41:11.694Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://algo.weixin.qq.com/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/librauee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-08-10T08:55:37.000Z","updated_at":"2024-07-06T05:45:28.000Z","dependencies_parsed_at":"2023-05-17T11:30:41.726Z","dependency_job_id":null,"html_url":"https://github.com/librauee/WBDC","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/librauee%2FWBDC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/librauee%2FWBDC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/librauee%2FWBDC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/librauee%2FWBDC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/librauee","download_url":"https://codeload.github.com/librauee/WBDC/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249326185,"owners_count":21251735,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T20:18:23.887Z","updated_at":"2025-04-17T08:32:45.774Z","avatar_url":"https://github.com/librauee.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# WX challenge\n\n## **1. 环境依赖**\n\n\n- Python 3.6.5\n- numba 0.53.1\n- numpy 1.18.5\n- pandas 1.0.5\n- scikit-learn 0.23.1\n- tensorflow-gpu 1.13.1\n- tqdm 4.46.1\n- scipy 1.5.0\n- deepctr 0.8.6\n- gensim 3.8\n\n    \n## **2. 目录结构**\n\n```\n./\n├── README.md\n├── requirements.txt, python package requirements \n├── init.sh, script for installing package requirements\n├── train.sh, script for preparing train/inference data and training models, including pretrained models\n├── inference.sh, main function for inference on test dataset\n├── src\n│   ├── prepare, codes for preparing train/inference dataset\n|       ├──get_features.py   \n│   ├── model, codes for model architecture\n|       ├──mmoe.py  \n|   ├── train, codes for training \n|       ├──run_submit.py\n|   ├── evaluation.py, main function for evaluation \n|   ├── inference.py\n|   ├── inference1.py\n├── data\n│   ├── wedata\n|       ├──wechat_algo_data1, dataset of the competition\n|       ├──wechat_algo_data2, dataset of the competition\n|   ├── submission, prediction result after running inference.sh\n|   ├── model, model files\n|   ├── feature, feature files\n```\n\n## **3. 运行流程**\n\n- 进入目录：cd /home/tione/notebook/wbdc2021-semi\n- 安装环境：使用 conda_tensorflow_py3虚拟环境 运行sh init.sh\n- 数据准备和模型训练：sh train.sh\n- 预测并生成结果文件：sh inference.sh /home/tione/notebook/wbdc2021-semi/data/wedata/wechat_algo_data2/test_b.csv\n\n\n## **4. 模型及特征**\n- 模型：[MMOE](https://dl.acm.org/doi/pdf/10.1145/3219819.3220007)\n- 参数：\n    - batch_size: 4092\n    - emded_dim: 512\n    - num_epochs: 5\n    - learning_rate: 0.01\n- 特征：\n    - userid, feedid, authorid, bgm_singer_id, bgm_song_id等id类特征\n    - keyword、tag标签特征\n    - 视频类别、作者类别\n    - userid序列embedding\n    - feed聚类、author聚类、user聚类\n\n## **5. 算法性能**\n\n- 资源配置：2*P40_48G显存_14核CPU_112G内存\n- 预测耗时\n     - 总预测时长: 1791 s\n     - 单个目标行为2000条样本的平均预测时长: 120.344 ms\n     \n## **6. 代码说明**\n\n模型预测部分代码位置如下：\n\n| 路径 | 行数 | 内容 |\n| :--- | :--- | :--- |\n| src/inference.py | 82 - 96 | `pred_ans = train_model.predict(test_model_input, batch_size=batch_size * 100) `|\n| src/inference1.py | 93 - 108 | `pred_ans = train_model.predict(test_model_input, batch_size=batch_size * 100) `|\n\n## **7. 相关文献**\n* Ma J, Zhao Z, Yi X, et al. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery \u0026 Data Mining. 2018: 1930-1939.\n* Weichen Shen. (2017). DeepCTR: Easy-to-use,Modular and Extendible package of deep-learning based CTR models. https://github.com/shenweichen/deepctr.\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flibrauee%2Fwbdc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flibrauee%2Fwbdc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flibrauee%2Fwbdc/lists"}