{"id":20464493,"url":"https://github.com/bytedance/largebatchctr","last_synced_at":"2025-08-21T15:32:17.336Z","repository":{"id":39343734,"uuid":"484282725","full_name":"bytedance/LargeBatchCTR","owner":"bytedance","description":"Large batch training of CTR models based on DeepCTR with CowClip.","archived":false,"fork":false,"pushed_at":"2023-02-08T12:47:26.000Z","size":7511,"stargazers_count":164,"open_issues_count":0,"forks_count":24,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-12-10T07:43:42.825Z","etag":null,"topics":["ctr","deep-learning","recommendation-system"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bytedance.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-04-22T03:27:28.000Z","updated_at":"2024-11-24T14:13:09.000Z","dependencies_parsed_at":"2022-07-16T15:17:05.709Z","dependency_job_id":null,"html_url":"https://github.com/bytedance/LargeBatchCTR","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FLargeBatchCTR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FLargeBatchCTR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FLargeBatchCTR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FLargeBatchCTR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bytedance","download_url":"https://codeload.github.com/bytedance/LargeBatchCTR/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230520392,"owners_count":18238948,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ctr","deep-learning","recommendation-system"],"created_at":"2024-11-15T13:15:23.065Z","updated_at":"2024-12-20T01:15:30.396Z","avatar_url":"https://github.com/bytedance.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Large Batch Training for CTR Prediction (CowClip)\n\nLargeBatchCTR aims to train CTR prediction models with large batch (~128k). The framework is based on [DeepCTR](https://github.com/shenweichen/DeepCTR). You can run the code on a V100 GPU to feel the fast training speed.\n\nAdaptive Column-wise Clipping (CowClip) method from paper \"CowClip: Reducing CTR Prediction Model Training\nTime from 12 hours to 10 minutes on 1 GPU\" is implemented in this repo.\n\n## Get Started\n\nFirst, download dataset to the data folder. Use `data_utils.py` to preprocess the data for training.\n\n```sh\npython data_utils.py --dataset criteo_kaggle --split rand\n```\n\nThen, use `train.py` to train the network.\n\n```sh\n# Criteo (baseline)\nCUDA_VISIBLE_DEVICES=0 python train.py --dataset criteo_kaggle --model DeepFM\n# Avazu (baseline)\nCUDA_VISIBLE_DEVICES=0 python train.py --dataset avazu --model DeepFM\n```\n\nFor large batch training with CowClip, do as follows:\n\n```sh\n# Criteo (8K)\nCUDA_VISIBLE_DEVICES=0 python train.py --dataset criteo_kaggle --model DeepFM --lr_embed 1e-4 --warmup 1 --init_stddev 1e-2 --clip 1 --bound 1e-5 --bs 8192 --l2 8e-05 --lr 22.6274e-4\n# Criteo (128K)\nCUDA_VISIBLE_DEVICES=0 python train.py --dataset criteo_kaggle --model DeepFM --lr_embed 1e-4 --warmup 1 --init_stddev 1e-2 --clip 1 --bound 1e-5 --bs 131072 --l2 128e-05 --lr 90.5096e-4\n# Avazu (64K)\nCUDA_VISIBLE_DEVICES=0 python train.py --dataset avazu --model DeepFM --lr_embed 1e-4 --warmup 1 --init_stddev 1e-2 --clip 1 --bound 1e-4 --bs 65536 --l2 64e-05 --lr 8e-4\n```\n\n## CowClip Quick Look\n\n![CowClip Algorithm Quick Look](./assets/cowclip.png)\n\n## Dataset List\n\n- [Criteo Kaggle](https://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset): download `train.txt` in `data/criteo_kaggle/`\n- [Avazu](https://www.kaggle.com/c/avazu-ctr-prediction): download `train` in `data/avazu/`\n\n## Hyperparameters\n\nThe meaning of hyperparameters in the command line is as follows:\n\n| params        | name                                        |\n| ------------- | ------------------------------------------- |\n| --bs          | batch size                                  |\n| --lr_embed    | learning rate for the embedding layer       |\n| --lr          | learning rate for the dense weights         |\n| --l2          | L2-regularization weight λ                  |\n| --clip        | CowClip coefficient r                       |\n| --bound       | CowClip bound ζ                             |\n| --warmup      | number of epochs to warmup on dense weights |\n| --init_stddev | initialization weight standard deviation    |\n\nThe hyperparameters neet to be scaled are listed as follows. For Criteo dataset:\n\n| bs   | lr       | l2     |   ζ   | DeepFM AUC(%) | Time(min) |\n| :--- | :------- | :----- | :---: | :-----------: | :-------: |\n| 1K   | 8e-4     | 1e-5   | 1e-5  |     80.86     |    768    |\n| 2K   | 11.31e-4 | 2e-5   | 1e-5  |     80.93     |    390    |\n| 4K   | 16e-4    | 4e-5   | 1e-5  |     80.97     |    204    |\n| 8K   | 22.62e-4 | 8e-5   | 1e-5  |     80.97     |    102    |\n| 16K  | 32e-4    | 16e-5  | 1e-5  |     80.94     |    48     |\n| 32K  | 45.25e-4 | 32e-5  | 1e-5  |     80.95     |    27     |\n| 64K  | 64e-4    | 64e-5  | 1e-5  |     80.96     |    15     |\n| 128K | 90.50e-4 | 128e-5 | 1e-5  |     80.90     |     9     |\n\nFor Avazu dataset:\n\n| bs   | lr      | l2    |   ζ   | DeepFM AUC(%) | Time(min) |\n| :--- | :------ | :---- | :---: | :-----------: | :-------: |\n| 1K   | 1e-4    | 1e-5  | 1e-3  |     78.83     |    210    |\n| 2K   | 1.41e-4 | 2e-5  | 1e-3  |     78.82     |    108    |\n| 4K   | 2e-4    | 4e-5  | 1e-4  |     78.90     |    54     |\n| 8K   | 2.83e-4 | 8e-5  | 1e-4  |     79.06     |    30     |\n| 16K  | 4e-4    | 16e-5 | 1e-4  |     79.01     |    17     |\n| 32K  | 5.66e-4 | 32e-5 | 1e-4  |     78.82     |    10     |\n| 64K  | 8e-4    | 64e-5 | 1e-4  |     78.82     |    6.7    |\n| 128K | 16e-4   | 96e-5 | 1e-4  |     78.80     |    4.8    |\n\n## Model List\n\n|        Model         | Paper                                                                                                                                              |\n| :------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------- |\n|     Wide \u0026 Deep      | [DLRS 2016][Wide \u0026 Deep Learning for Recommender Systems](https://arxiv.org/pdf/1606.07792.pdf)                                                    |\n|        DeepFM        | [IJCAI 2017][DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](http://www.ijcai.org/proceedings/2017/0239.pdf)              |\n| Deep \u0026 Cross Network | [ADKDD 2017][Deep \u0026 Cross Network for Ad Click Predictions](https://arxiv.org/abs/1708.05123)                                                      |\n|        DCN V2        | [arxiv 2020][DCN V2: Improved Deep \u0026 Cross Network and Practical Lessons for Web-scale Learning to Rank Systems](https://arxiv.org/abs/2008.13535) |\n\n## Requirements\n\nTensorflow 2.4.0  \nTensorflow-Addons\n\n```sh\npip install -r requirements.txt\n```\n\n## Citation\n\n```bibtex\n@article{zheng2022cowclip,\n  title={{CowClip}: Reducing {CTR} Prediction Model Training Time from 12 hours to 10 minutes on 1 {GPU}},\n  author={Zangwei Zheng, Pengtai Xu, Xuan Zou, Da Tang, Zhen Li, Chenguang Xi, Peng Wu, Leqi Zou, Yijie Zhu, Ming Chen, Xiangzhuo Ding, Fuzhao Xue, Ziheng Qing, Youlong Cheng, Yang You},\n  journal={arXiv},\n  volume={abs/2204.06240},\n  year={2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytedance%2Flargebatchctr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbytedance%2Flargebatchctr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytedance%2Flargebatchctr/lists"}