{"id":13595416,"url":"https://github.com/terrifyzhao/text_matching","last_synced_at":"2025-04-05T01:09:04.490Z","repository":{"id":41176388,"uuid":"183883262","full_name":"terrifyzhao/text_matching","owner":"terrifyzhao","description":"常用文本匹配模型tf版本，数据集为QA_corpus，持续更新中","archived":false,"fork":false,"pushed_at":"2019-10-12T08:11:59.000Z","size":25363,"stargazers_count":674,"open_issues_count":17,"forks_count":185,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-03-29T00:12:05.992Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/terrifyzhao.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-04-28T08:58:51.000Z","updated_at":"2025-02-13T10:54:35.000Z","dependencies_parsed_at":"2022-09-03T12:00:28.735Z","dependency_job_id":null,"html_url":"https://github.com/terrifyzhao/text_matching","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/terrifyzhao%2Ftext_matching","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/terrifyzhao%2Ftext_matching/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/terrifyzhao%2Ftext_matching/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/terrifyzhao%2Ftext_matching/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/terrifyzhao","download_url":"https://codeload.github.com/terrifyzhao/text_matching/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247271532,"owners_count":20911587,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T16:01:49.617Z","updated_at":"2025-04-05T01:09:04.472Z","avatar_url":"https://github.com/terrifyzhao.png","language":"Python","funding_links":[],"categories":["Python","文本匹配 文本检索 文本相似度"],"sub_categories":["其他_文本生成、文本对话"],"readme":"# text_matching\n文本匹配模型\n\n本项目包含目前大部分文本匹配模型，持续更新中，其中论文解读请点击[文本相似度，文本匹配模型归纳总结](https://blog.csdn.net/u012526436/article/details/90179466)\n\n数据集为QA_corpus，训练数据10w条，验证集和测试集均为1w条\n\n其中对应模型文件夹下的`args.py`文件是超参数\n\n训练：\n`python train.py`\n\n测试：\n`python test.py`\n\n词向量：\n不同的模型输入不一样，有的模型的输入只有简单的字向量，有的模型换成了字向量+词向量，甚至还有静态词向量(训练过程中不进行更新)和\n动态词向量(训练过程中更新词向量)，所有不同形式的输入均以封装好，调用方法如下\n\n\n静态词向量，请执行\n`python word2vec_gensim.py`，该版本是采用gensim来训练词向量\n\n动态词向量，请执行\n`python word2vec.py`，该版本是采用tensorflow来训练词向量，训练完成后会保存embedding矩阵、词典和词向量在二维矩阵的相对位置的图片，\n如果非win10环境，由于字体的原因图片可能保存失败\n\n测试集结果对比：\n\n模型 | loss | acc | 输入说明 | 论文地址\n:-: | :-: | :-: | :-: | :-: |\nDSSM | 0.7613157 | 0.6864 | 字向量 | [DSSM](https://posenhuang.github.io/papers/cikm2013_DSSM_fullversion.pdf) |\nConvNet | 0.6872447 | 0.6977 | 字向量 | [ConvNet](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.723.6492\u0026rep=rep1\u0026type=pdf) |\nESIM | 0.55444807| 0.736 | 字向量 | [ESIM](https://arxiv.org/pdf/1609.06038.pdf) |\nABCNN | 0.5771452| 0.7503 | 字向量 | [ABCNN](https://arxiv.org/pdf/1512.05193.pdf) |\nBiMPM | 0.4852| 0.764 | 字向量+静态词向量 | [BiMPM](https://arxiv.org/pdf/1702.03814.pdf) |\nDIIN | 0.48298636| 0.7694 | 字向量+动态词向量 | [DIIN](https://arxiv.org/pdf/1709.04348.pdf) |\nDRCN | 0.6549849 | 0.7811 | 字向量+静态词向量+动态词向量+是否有相同词 | [DRCN](https://arxiv.org/pdf/1805.11360.pdf) |\n\n以上测试结果可能不是模型的最优解，超参的选择也不一定是最优的，如果你想用到自己的实际工程中，请自行调整超参\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fterrifyzhao%2Ftext_matching","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fterrifyzhao%2Ftext_matching","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fterrifyzhao%2Ftext_matching/lists"}