{"id":13958462,"url":"https://github.com/metauto-ai/Kaleido-BERT","last_synced_at":"2025-07-21T00:31:00.769Z","repository":{"id":41380492,"uuid":"344720043","full_name":"mczhuge/Kaleido-BERT","owner":"mczhuge","description":"💐Kaleido-BERT: Vision-Language Pre-training on Fashion Domain","archived":false,"fork":false,"pushed_at":"2022-06-29T11:44:02.000Z","size":10467,"stargazers_count":264,"open_issues_count":1,"forks_count":19,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-08-08T13:13:05.224Z","etag":null,"topics":["bert","e-commerce","fashion","multimodal","pre-training","vision-language"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mczhuge.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-03-05T06:45:05.000Z","updated_at":"2024-07-20T19:31:53.000Z","dependencies_parsed_at":"2022-09-16T04:30:21.537Z","dependency_job_id":null,"html_url":"https://github.com/mczhuge/Kaleido-BERT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mczhuge%2FKaleido-BERT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mczhuge%2FKaleido-BERT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mczhuge%2FKaleido-BERT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mczhuge%2FKaleido-BERT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mczhuge","download_url":"https://codeload.github.com/mczhuge/Kaleido-BERT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226850001,"owners_count":17691896,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","e-commerce","fashion","multimodal","pre-training","vision-language"],"created_at":"2024-08-08T13:01:36.900Z","updated_at":"2025-07-21T00:30:54.487Z","avatar_url":"https://github.com/mczhuge.png","language":"Python","funding_links":[],"categories":["其他_机器视觉"],"sub_categories":["网络服务_其他"],"readme":"\u003cp align=\"center\"\u003e\u003cimg src=\"kaleidobert_logo.png\" width=\"850\"\u003e\u003c/p\u003e\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)\n\n[**[Paper]**](https://arxiv.org/pdf/2103.16110.pdf)[**[中文版]**](https://github.com/mczhuge/Kaleido-BERT/blob/main/CVPR2021_KaleidoBERT_Chinese.pdf)[**[Video]**](http://dpfan.net/wp-content/uploads/Kaleido-BERT.mp4)[**[Poster]**](https://github.com/mczhuge/Kaleido-BERT/blob/main/CVPR2021_KaleidoBERT_poster.pdf)[**[MSRA_Slide]**](http://dpfan.net/wp-content/uploads/MSRA_Oral_KaleidoBERT_高德宏.pdf)[**[News1]**](https://zhuanlan.zhihu.com/p/365497906)[**[New2]**](https://mp.weixin.qq.com/s/yPJZDeHSj8C5jGKGgDQF0Q)[**[MSRA_Talking]**](https://mp.weixin.qq.com/s/PeBk5vDi7lO8ZFo8FwN10w)[**[机器之心_Talking]**](https://jmq.h5.xeknow.com/s/2ogm2v)\n\n## Introduction\nWe present a new vision-language (VL) pre-training model dubbed Kaleido-BERT, which introduces a novel kaleido strategy for fashion cross-modality representations from transformers. In contrast to random masking strategy of recent VL models,  we design alignment guided masking to jointly focus more on image-text semantic relations. \nTo this end, we carry out five novel tasks, \\ie, rotation, jigsaw, camouflage, grey-to-color, and blank-to-color for self-supervised VL pre-training at patches of different scale. Kaleido-BERT is conceptually simple and easy to extend to the existing BERT framework, it attains state-of-the-art results by large margins on four downstream tasks, including text retrieval (R@1: 4.03\\% absolute improvement), image retrieval (R@1: 7.13\\% abs imv.), category recognition (ACC: 3.28\\% abs imv.), and fashion captioning (Bleu4: 1.2 abs imv.). We validate the efficiency of Kaleido-BERT on a wide range of e-commercial websites, demonstrating its broader potential in real-world applications.\n![framework](model.png) \n\n## Noted\n1) Code will be released in 2021/4/16.\n2) This is the tensorflow implementation built on [Alibaba/EasyTransfer](https://github.com/alibaba/EasyTransfer). \n3) If you feel hard to download these datasets, please modify `/dataset/get_pretrain_data.sh`, `/dataset/get_finetune_data.sh`, `/dataset/get_retrieve_data.sh`, and comment out some `wget #file_links` as you want. This will not inhibit following implementation.\n   \n## Get started\n1. Clone this code\n```\ngit clone git@github.com:mczhuge/Kaleido-BERT.git\ncd Kaleido-BERT\n```\n2. Enviroment setup (Details can be found on conda_env.info)\n```\nconda create --name kaleidobert --file conda_env.info\nconda activate kaleidobert\nconda install tensorflow-gpu=1.15.0\npip install boto3 tqdm tensorflow_datasets --index-url=https://mirrors.aliyun.com/pypi/simple/\npip install sentencepiece==0.1.92 sklearn --index-url=https://mirrors.aliyun.com/pypi/simple/\npip install joblib==0.14.1\npython setup.py develop\n```\n3. Download Pretrained Dependancy\n```\ncd Kaleido-BERT/scripts/checkpoint\nsh get_checkpoint.sh\n```\n4. Finetune\n```\n#Download finetune datasets\n\ncd Kaleido-BERT/scripts/dataset\nsh get_finetune_data.sh\nsh get_retrieve_data.sh\n\n#Testing CAT/SUB\n\ncd Kaleido-BERT/scripts\nsh run_cat.sh\nsh run_subcat.sh\n\n#Testing TIR/ITR\n\ncd Kaleido-BERT/scripts\nsh run_i2t.sh\nsh run_t2i.sh\n```\n5. Pre-training\n```\n#Download pre-training datasets\n\ncd Kaleido-BERT/scripts/dataset\nsh get_prtrain_data.sh\n\n#Remove existed checkpoint\nrm -rf Kaleido-BERT/checkpoint/pretrained\n\n#Run pre-training\ncd Kaleido-BERT/scripts/\nsh run_pretrain.sh\n```\n\n## Acknowlegement\nThanks Alibaba ICBU Search Team and Alibaba PAI Team for technical support.\n\n## Citing Kaleido-BERT\n```\n@inproceedings{zhuge2021kaleido,\n  title={Kaleido-bert: Vision-language pre-training on fashion domain},\n  author={Zhuge, Mingchen and Gao, Dehong and Fan, Deng-Ping and Jin, Linbo and Chen, Ben and Zhou, Haoming and Qiu, Minghui and Shao, Ling},\n  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n  pages={12647--12657},\n  year={2021}\n}\n```\n\n## Contact\n* Mingchen Zhuge (mczhuge@gmail.com)\n* Dehong Gao (dehong.gdh@alibaba-inc.com)  \n* Deng-Ping Fan (dpfan@gmail.com)  \n        \nFeel free to contact us if you have additional questions. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmetauto-ai%2FKaleido-BERT","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmetauto-ai%2FKaleido-BERT","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmetauto-ai%2FKaleido-BERT/lists"}