{"id":13594712,"url":"https://github.com/FudanNLP/nlp-beginner","last_synced_at":"2025-04-09T07:33:47.944Z","repository":{"id":38206576,"uuid":"86599573","full_name":"FudanNLP/nlp-beginner","owner":"FudanNLP","description":"NLP上手教程","archived":false,"fork":false,"pushed_at":"2021-05-23T08:38:17.000Z","size":172,"stargazers_count":5855,"open_issues_count":3,"forks_count":1313,"subscribers_count":100,"default_branch":"master","last_synced_at":"2024-10-16T03:41:25.297Z","etag":null,"topics":["fudannlp","step-by-step"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FudanNLP.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-03-29T15:34:03.000Z","updated_at":"2024-10-16T03:08:57.000Z","dependencies_parsed_at":"2022-07-12T17:12:26.943Z","dependency_job_id":null,"html_url":"https://github.com/FudanNLP/nlp-beginner","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FudanNLP%2Fnlp-beginner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FudanNLP%2Fnlp-beginner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FudanNLP%2Fnlp-beginner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FudanNLP%2Fnlp-beginner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FudanNLP","download_url":"https://codeload.github.com/FudanNLP/nlp-beginner/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223375413,"owners_count":17135366,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fudannlp","step-by-step"],"created_at":"2024-08-01T16:01:38.145Z","updated_at":"2024-11-06T16:31:46.471Z","avatar_url":"https://github.com/FudanNLP.png","language":null,"funding_links":[],"categories":["Others","其他_NLP自然语言处理"],"sub_categories":["大语言对话模型及数据"],"readme":"# NLP-Beginner：自然语言处理入门练习\n\n新加入本实验室的同学，请按要求完成下面练习，并提交报告。\n\n*请完成每次练习后把report上传到QQ群中的共享文件夹中的“Reports of nlp-beginner”目录，文件命名格式为“task 1+姓名”。*\n\n参考：\n\n1. [深度学习上手指南](https://github.com/nndl/nndl.github.io/blob/master/md/DeepGuide.md)\n2. 《[神经网络与深度学习](https://nndl.github.io/)》 \n3. 不懂问google\n\n\n\n\n\n### 任务一：基于机器学习的文本分类\n\n实现基于logistic/softmax regression的文本分类\n\n1. 参考\n   1. [文本分类](文本分类.md)\n   2. 《[神经网络与深度学习](https://nndl.github.io/)》 第2/3章\n2. 数据集：[Classify the sentiment of sentences from the Rotten Tomatoes dataset](https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews)\n3. 实现要求：NumPy\n4. 需要了解的知识点：\n\n   1. 文本特征表示：Bag-of-Word，N-gram\n   2. 分类器：logistic/softmax  regression，损失函数、（随机）梯度下降、特征选择\n   3. 数据集：训练集/验证集/测试集的划分\n5. 实验：\n   1. 分析不同的特征、损失函数、学习率对最终分类性能的影响\n   2. shuffle 、batch、mini-batch \n6. 时间：两周\n\n### 任务二：基于深度学习的文本分类\n\n熟悉Pytorch，用Pytorch重写《任务一》，实现CNN、RNN的文本分类；\n\n1. 参考\n\n   1. https://pytorch.org/\n   2. Convolutional Neural Networks for Sentence Classification \u003chttps://arxiv.org/abs/1408.5882\u003e\n   3. \u003chttps://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/\u003e\n2. word embedding 的方式初始化\n1. 随机embedding的初始化方式\n  2. 用glove 预训练的embedding进行初始化 https://nlp.stanford.edu/projects/glove/\n3. 知识点：\n\n   1. CNN/RNN的特征抽取\n   2. 词嵌入\n   3. Dropout\n4. 时间：两周\n\n### 任务三：基于注意力机制的文本匹配\n\n输入两个句子判断，判断它们之间的关系。参考[ESIM]( https://arxiv.org/pdf/1609.06038v3.pdf)（可以只用LSTM，忽略Tree-LSTM），用双向的注意力机制实现。\n\n1. 参考\n   1. 《[神经网络与深度学习](https://nndl.github.io/)》 第7章\n   2. Reasoning about Entailment with Neural Attention \u003chttps://arxiv.org/pdf/1509.06664v1.pdf\u003e\n   3. Enhanced LSTM for Natural Language Inference \u003chttps://arxiv.org/pdf/1609.06038v3.pdf\u003e\n2. 数据集：https://nlp.stanford.edu/projects/snli/\n3. 实现要求：Pytorch\n4. 知识点：\n   1. 注意力机制\n   2. token2token attetnion\n5. 时间：两周\n\n\n### 任务四：基于LSTM+CRF的序列标注\n\n用LSTM+CRF来训练序列标注模型：以Named Entity Recognition为例。\n\n1. 参考\n   1. 《[神经网络与深度学习](https://nndl.github.io/)》 第6、11章\n   2. https://arxiv.org/pdf/1603.01354.pdf\n   3. https://arxiv.org/pdf/1603.01360.pdf\n2. 数据集：CONLL 2003，https://www.clips.uantwerpen.be/conll2003/ner/\n3. 实现要求：Pytorch\n4. 知识点：\n   1. 评价指标：precision、recall、F1\n   2. 无向图模型、CRF\n5. 时间：两周\n\n### 任务五：基于神经网络的语言模型\n\n用LSTM、GRU来训练字符级的语言模型，计算困惑度\n\n1. 参考\n   1. 《[神经网络与深度学习](https://nndl.github.io/)》 第6、15章\n2. 数据集：poetryFromTang.txt\n3. 实现要求：Pytorch\n4. 知识点：\n   1. 语言模型：困惑度等\n   2. 文本生成\n5. 时间：两周","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFudanNLP%2Fnlp-beginner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FFudanNLP%2Fnlp-beginner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFudanNLP%2Fnlp-beginner/lists"}