{"id":20672967,"url":"https://github.com/juneyaooo/team-learning-nlp","last_synced_at":"2026-03-09T19:43:21.462Z","repository":{"id":176815169,"uuid":"340050566","full_name":"JuneYaooo/team-learning-nlp","owner":"JuneYaooo","description":"this is June's learning notes about EasyTransfer","archived":false,"fork":false,"pushed_at":"2021-02-25T15:24:48.000Z","size":445,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-10T18:09:43.146Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JuneYaooo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-02-18T12:58:21.000Z","updated_at":"2021-02-25T15:24:50.000Z","dependencies_parsed_at":"2023-06-28T11:00:28.221Z","dependency_job_id":null,"html_url":"https://github.com/JuneYaooo/team-learning-nlp","commit_stats":null,"previous_names":["juneyaooo/team-learning-nlp"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/JuneYaooo/team-learning-nlp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuneYaooo%2Fteam-learning-nlp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuneYaooo%2Fteam-learning-nlp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuneYaooo%2Fteam-learning-nlp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuneYaooo%2Fteam-learning-nlp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JuneYaooo","download_url":"https://codeload.github.com/JuneYaooo/team-learning-nlp/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuneYaooo%2Fteam-learning-nlp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30309920,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-09T17:35:44.120Z","status":"ssl_error","status_checked_at":"2026-03-09T17:35:43.707Z","response_time":61,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-16T20:39:41.180Z","updated_at":"2026-03-09T19:43:21.454Z","avatar_url":"https://github.com/JuneYaooo.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# NLP实践-中文预训练模型泛化能力挑战赛（文本分类，bert）专题 学习笔记\n\n![avatar](/image/env-flow.png?raw=true)\n\n- [Docker 安装与使用](#docker------)\n- [阿里云开通镜像仓库](#---------)\n- [获取baseline](#--baseline)\n- [配置代码环境](#------)\n- [本机运行并提交](#-------)\n\n\u003e 配置\n\u003e\n\u003e - 操作系统：windows10家庭中文版  \n\u003e - 显卡：NVIDIA GeForce GTX 1060  \n\u003e - 环境：pytorch1.6.0（GPU版）+ CUDA10.2 + VSCode + windows版Docker\n\u003e\n\n## Docker 安装与使用\n\n首先了解Docker是什么及安装方法，还有简单常用的命令了解一下。参考博客：https://www.ruanyifeng.com/blog/2018/02/docker-tutorial.html\n\n或者可以看dataWhale的Docker环境配置指南，里面包含了Docker安装，创建镜像仓库及提交到天池：https://mp.weixin.qq.com/s/JiimSmuD3S5lSS9MmH2GJw\n\n在Windows安装时，注意可能需要安装WSL2更新。\n\n![avatar](/image/wsl2.png?raw=true)\n\n如果出现提示，可以点进链接按照步骤操作：\n\nhttps://docs.microsoft.com/en-us/windows/wsl/install-win10#step-4---download-the-linux-kernel-update-package\n\n安装好docker以后，可以使用cmd 命令或 PowerShell 对docker进行操作，注意Windows里使用docker的命令，前面不需要加sudo, 直接docker...就好。\n\n安装完成，测试一下能用即可，然后继续下面的步骤。\n\n## 阿里云开通镜像仓库\n\n注册阿里云容器镜像服务  \n参考教程：https://tianchi.aliyun.com/competition/entrance/231759/tab/226  \n这个讲的挺详细了，在创建完自己的镜像仓库以后，再登陆进来，记得需要选一下地址才能看到：\n\n![avatar](/image/alicould.png?raw=true)\n\n点进去以后，可以看到下面有一些自己镜像仓库常用的命令，比如登录，推送等。如果是在windows里使用，把命令前的sudo去掉就行了\n\n![avatar](/image/aliregistry.png?raw=true)\n\n## 获取baseline\n\n来源地址：https://github.com/finlay-liu/tianchi-multi-task-nlp\n\n在本地安装git，把这个库里的代码clone下来。这里附上一个之前总结的git bash常用使用命令流程\n![avatar](/image/git-flow.png?raw=true)\n\n## 配置代码环境\n\n首先看一下自己电脑上有没有CUDA，是什么版本，然后再开始安装对应支持的pytorch版本，这个比较重要。\n\n## 本机运行并提交\n\n按照baseline上的流程，把文件下载下来放在几个文件夹里，然后开始顺序运行generate_data.py -\u003e train.py -\u003e inference.py 几个文件。\n\n我在运行的过程中遇到了 CUDA out of memory的错误。\n\n![avatar](/image/batchsize.png?raw=true)\n\n可以把执行train函数里的参数 batch_size 调小一点。同时，如果电脑配置不高，希望快点出结果的话，可以把epoch调少一点，比如1或者2。 \n\n```python\ntrain(epochs=2, batchSize=6, device='cuda:0', load_saved=True ,a_step=16, lr=0.0001,  pretrained_model=pretrained_model, tokenizer_model=tokenizer_model, weighted_loss=True)\n```\n\n在运行的过程中，因为我把batchsize改小了，出现了一个bug：\n\n![avatar](/image/cur_error.png?raw=true)\n\n去仔细看了下代码，出现bug这个原因是为什么呢？在get_next_batch()函数里面， 当总剩余数据量\u003e0时，只分别给出了几个分数据集剩余数据量\u003e0的情况，没有考虑到分数据集剩余数据量=0的情况。所以当总数据集长度不为0，但某个数据集长度已经为0的情况下，tnews_cur就没有在引用前被定义，就会报错。\n![avatar](/image/curfuc_raw.png?raw=true)\n\n这里附上修改后的elif部分的逻辑判断，改完以后，就可以正常运行了\n\n```python\nelif total_len \u003e batchSize:\n    if ocnli_len \u003e 0:\n        ocnli_tmp_len = int((ocnli_len / total_len) * batchSize)\n        ocnli_cur = self.ocnli_ids[:ocnli_tmp_len]\n        self.ocnli_ids = self.ocnli_ids[ocnli_tmp_len:]\n    elif ocnli_len == 0:\n        ocnli_cur = self.ocnli_ids\n        self.ocnli_ids = []\n    if ocemotion_len \u003e 0:\n        ocemotion_tmp_len = int((ocemotion_len / total_len) * batchSize)\n        ocemotion_cur = self.ocemotion_ids[:ocemotion_tmp_len]\n        self.ocemotion_ids = self.ocemotion_ids[ocemotion_tmp_len:]\n    elif ocemotion_len ==0:\n        ocemotion_cur = self.ocemotion_ids\n        self.ocemotion_ids = []\n    if tnews_len \u003e 0:\n        tnews_tmp_len = batchSize - len(ocnli_cur) - len(ocemotion_cur)\n        tnews_cur = self.tnews_ids[:tnews_tmp_len]\n        self.tnews_ids = self.tnews_ids[tnews_tmp_len:]\n    elif tnews_len == 0:\n        tnews_cur = self.tnews_ids\n        self.tnews_ids = []\n\n```\n\n一些概念理解：\n把所有样本都跑一遍，就叫完成一期训练，是一个epoch。batch size是一次训练所抓取的数据样本数量。在这个项目中，我们三个数据集总数据量大概为143443条数据，如果设置batch size=4, 则一共要跑143443/4=35860个batch。那么我们一个epoch结束，看打印出来的batch数量就差不多是35000 th了。\n\n![avatar](/image/batch_num.png?raw=true)\n\n最后生成结果以后，需要在文件夹里打包一下几个结果json文件。按照比赛提交要求命名。\n\n![avatar](/image/submission.png?raw=true)\n\n在PowerShell 或者 CMD命令窗口中，将目录切换到当前submission目录下，按照basline里面的说明，登录阿里云镜像仓库（把阿里云镜像仓库的网页打开，里面会用到的命令），构建镜像，提交（tag和push）镜像到远端。注意版本号是自己取，1.0 2.0之类的。\n\n![avatar](/image/push.png?raw=true)\n\n最后在比赛提交页面，填写镜像路径+版本号，以及用户名和密码。稍微等一会就可以看成绩啦。\n\n![avatar](/image/score.png?raw=true)\n\n## 模型提升\n\n本机配置太差，batchsize不能超过6，最开始效果一直不好。提升了一点，主要是增大了a_step参数。\n\n![avatar](/image/improve.png?raw=true)\n\n## 其他\n\n### 服务器运行踩坑\n\n因为本机配置不够高，去租了服务器，涉及到一些Linux文件运行问题，特记录一下：\n\n1.文件传输\n\n可以用Xftp,官网有个人版/学生版 是免费使用的，非常方便，可以去官网下载\n\n2.在Linux下执行python脚本文件\n\n参阅：https://blog.csdn.net/qq_28267025/article/details/60337293\n\n如果出现类似于这种报错:\n\n```python\n/usr/bin/env: ‘python3\\r’: No such file or directory\n```\n\n可以参阅这篇解答：\n\nhttps://askubuntu.com/questions/896860/usr-bin-env-python3-r-no-such-file-or-directory","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuneyaooo%2Fteam-learning-nlp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjuneyaooo%2Fteam-learning-nlp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuneyaooo%2Fteam-learning-nlp/lists"}