{"id":13754249,"url":"https://github.com/nilboy/gaic_track3_pair_sim","last_synced_at":"2025-05-09T22:31:33.584Z","repository":{"id":148061936,"uuid":"353003171","full_name":"nilboy/gaic_track3_pair_sim","owner":"nilboy","description":"全球人工智能技术创新大赛-赛道三-冠军方案","archived":false,"fork":false,"pushed_at":"2021-07-12T08:45:07.000Z","size":163,"stargazers_count":235,"open_issues_count":1,"forks_count":59,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-11-16T07:33:14.455Z","etag":null,"topics":["text-pair"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nilboy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-30T13:02:56.000Z","updated_at":"2024-10-24T06:45:07.000Z","dependencies_parsed_at":"2023-05-19T03:00:11.716Z","dependency_job_id":null,"html_url":"https://github.com/nilboy/gaic_track3_pair_sim","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nilboy%2Fgaic_track3_pair_sim","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nilboy%2Fgaic_track3_pair_sim/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nilboy%2Fgaic_track3_pair_sim/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nilboy%2Fgaic_track3_pair_sim/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nilboy","download_url":"https://codeload.github.com/nilboy/gaic_track3_pair_sim/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253335695,"owners_count":21892714,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["text-pair"],"created_at":"2024-08-03T09:01:52.075Z","updated_at":"2025-05-09T22:31:31.848Z","avatar_url":"https://github.com/nilboy.png","language":"Python","funding_links":[],"categories":["文本匹配 文本检索 文本相似度"],"sub_categories":["其他_文本生成、文本对话"],"readme":"# gaic_track3_pair_sim\n[全球人工智能技术创新大赛-赛道三-冠军方案](https://yiwise-algo.yuque.com/docs/share/5a1e3b76-4d04-4127-979a-496d7bc8c1b8?#%20%E3%80%8A%E7%9F%AD%E6%96%87%E6%9C%AC%E7%9B%B8%E4%BC%BC%E5%8C%B9%E9%85%8D%E3%80%8B)\n## 比赛主页\nhttps://tianchi.aliyun.com/competition/entrance/531851/introduction\n\n## 数据\n本项目没有提供数据，如果需要数据，请到天池比赛主页下载\n\n## 预训练模型准备\n* 下载预训练模型\n    - nezha-base:\n      \n      https://drive.google.com/file/d/1HmwMG2ldojJRgMVN0ZhxqOukhuOBOKUb/view?usp=sharing\n    - nezha-large:\n      \n      https://drive.google.com/file/d/1EtahNvdjEpugm8juFuPIN_Fs2skFmeMU/view?usp=sharing\n    - uer/bert-base:\n      \n      https://share.weiyun.com/5QOzPqq\n    - uer/bert-large:\n    \n      https://share.weiyun.com/5G90sMJ\n    - macbert, chinese-bert-wwm-ext, chinese-roberta-wwm-ext-large\n    \n      https://huggingface.co/models\n* 预训练模型开源仓库\n    - https://github.com/dbiir/UER-py\n    - https://github.com/huawei-noah/Pretrained-Language-Model\n* 下载并解压, 解压到文件夹 data, 文件夹结构如下:\n    ```\n    data/\n    └── official_model\n        └── download\n            ├── chinese-bert-wwm-ext\n            │   ├── added_tokens.json\n            │   ├── config.json\n            │   ├── pytorch_model.bin\n            │   ├── special_tokens_map.json\n            │   ├── tokenizer_config.json\n            │   └── vocab.txt\n            ├── chinese-roberta-wwm-ext-large\n            │   ├── config.json\n            │   ├── pytorch_model.bin\n            │   ├── special_tokens_map.json\n            │   ├── tokenizer.json\n            │   ├── tokenizer_config.json\n            │   └── vocab.txt\n            ├── macbert-base\n            │   ├── added_tokens.json\n            │   ├── config.json\n            │   ├── pytorch_model.bin\n            │   ├── special_tokens_map.json\n            │   ├── tokenizer.json\n            │   ├── tokenizer_config.json\n            │   └── vocab.txt\n            ├── macbert-large\n            │   ├── added_tokens.json\n            │   ├── config.json\n            │   ├── pytorch_model.bin\n            │   ├── special_tokens_map.json\n            │   ├── tokenizer.json\n            │   ├── tokenizer_config.json\n            │   └── vocab.txt\n            ├── mixed_corpus_bert_base_model.bin\n            ├── mixed_corpus_bert_large_model.bin\n            └── nezha-cn-base\n                ├── bert_config.json\n                ├── pytorch_model.bin\n                └── vocab.txt\n    ```\n* 预训练模型[md5](user_data/md5.txt)\n\n## 环境准备\n* torch==1.7.0\n* transformers=4.3.0.rc1\n* simpletransformers==0.51.15\n* TensorRT-7.2.1.6\n\n## 端到端训练脚本\n```\ncd code\nbash ./run.sh\n```\n## 不同版本方案\n\n* 方案一: 预训练(多个模型) + finetune-分类(多个模型) + 生成软标签 + 训练regression模型(软标签，单模型)\n    ```\n    cd code\n    bash ./train.sh\n    ```\n    初赛使用的该方案，初赛成绩为0.9220；\n\n* 方案二: 预训练(多个模型) + 加载预训练参数，初始化一个大模型 + 训练分类模型(单模型)\n    ```\n    pipeline/pipeline_b.py\n    ```\n    训练一个144层模型(6 * 12 + 24 * 3);\n  \n    该模型单模型在复赛A榜成绩0.9561；推理平均时间15ms；\n\n* 方案三: 预训练(多个模型) + finetune-分类(多个模型) + 平均融合\n    ```\n    pipeline/pipeline_d.py\n    ```\n    融合6个bert-base + 3个bert-large模型；\n    \n    该模型在复赛A榜没测试，B榜成绩0.9593；推理平均时间15ms；\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnilboy%2Fgaic_track3_pair_sim","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnilboy%2Fgaic_track3_pair_sim","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnilboy%2Fgaic_track3_pair_sim/lists"}