{"id":18743378,"url":"https://github.com/macanv/mqnlp","last_synced_at":"2025-10-29T02:03:38.698Z","repository":{"id":130006531,"uuid":"113520297","full_name":"macanv/MQNLP","owner":"macanv","description":"自然语言处理相关实验实现  some experiment of natural language processing, Like text classification, named entity recognition, pos-tags, segment, key words extractor, auto summarize etc.","archived":false,"fork":false,"pushed_at":"2018-11-22T01:26:09.000Z","size":42630,"stargazers_count":53,"open_issues_count":0,"forks_count":22,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-26T15:54:08.300Z","etag":null,"topics":["fasttext","lstm","ner","pos-tagging","segment","sequence-labeling","textclassification","textcnn","textrnn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/macanv.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-12-08T02:13:33.000Z","updated_at":"2025-01-17T09:51:26.000Z","dependencies_parsed_at":null,"dependency_job_id":"e3541a65-e387-4cb3-a473-03d154eb47a7","html_url":"https://github.com/macanv/MQNLP","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macanv%2FMQNLP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macanv%2FMQNLP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macanv%2FMQNLP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macanv%2FMQNLP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/macanv","download_url":"https://codeload.github.com/macanv/MQNLP/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248636011,"owners_count":21137354,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fasttext","lstm","ner","pos-tagging","segment","sequence-labeling","textclassification","textcnn","textrnn"],"created_at":"2024-11-07T16:11:19.051Z","updated_at":"2025-10-29T02:03:33.651Z","avatar_url":"https://github.com/macanv.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"## MQNLP 一个乱七八糟的项目\n\n这个项目其实在心里想了很多次想开始了，其实也就是上半年的NLP课上开始的，后来项目被git不小心给覆盖掉了。。。\n这是一个新的开始，想把最近学的东西都实现了，目标覆盖的东西有:\n#### 1.文本分类\n- 1. LR\n\u003e  采用逻辑回归用于文本分类，其中对TFIDF特征的参数进行了探讨，以及在线性回归的基础上，添加不同的正则项对实验结果造成的影响。同时使用网格方法进行逻辑回归进行调参\nhttp://blog.csdn.net/macanv/article/details/78963762\n- 2. naive Bayes\n\u003e 采用生成模型朴素贝叶斯进行文本分类，其中探讨了线性调参和n-gram特征分类的性能影响，也使用了网格调参\nhttp://blog.csdn.net/macanv/article/details/78964020\n- 3. SVM\n\u003e 采用SVM用于文本分类，对不同的核函数在文本上进行了验证，最后发现线性核在文本分类上表现最优，同时由于特征维度巨大，在SVM的计算因为维度灾难造成计算量巨大，借此，引入了降维方法，包括大众的PCA和基于LDA(隐含狄利克雷分配)进行主题的挑选进行降维\n- 4. fastText\n\u003e 借助Facebook的fasttext api以及基于TensorFlow的fasttext，用于文本分类\n- 5. TextCNN\n\u003e 基于卷积神经网络对文本进行分类，实验中按照论文：进行了调参实验。\n- 6. TextRNN\n\u003e基于RNN网络对文本进行分类， 代码只需要提前设置好RNN cell的类型(LSTM,GRU，bi-LSTM, bi-GRU）,同时，指定num_layer,可以轻松创建多层RNN\n- 7. TextCNNRNN\n\u003e CNN+MaxPooling后接RNN\n- 8. ...\n\n####  2. 中文分词\n\u003e 包含基于序列标注的BiLSTM-CRF 和一种无须词典的中文分词，基于HMM的中文分词代码后续再实现\n\u003e 无须词典的中文分词方法基于粘合度和边界熵的中文分词，方法提出blog:http://www.matrix67.com/blog/archives/5044\n   实现地址:https://github.com/Moonshile/ChineseWordSegmentation\n\u003e 我在其实现上修改了部分代码，支持python3\n##### 2.1 一点废话：\n\u003e 无须词典的中文分词，我更倾向于认为他是一个词典生成方法，原始模型不受已经存在的词典影响，直接由训练数据中的写作风格影响，不区分于领域。\n\n#### 3. 命名实体识别\n\u003e 添加分词信息的中文命名实体识别，在其基础上进行了修改，论文正在审，录用了会上传最新的代码。\n参考: https://github.com/zjy-ucas/ChineseNER\n\n#### 4. 词性标注\n\u003e 这部分代码任然使用的是和NER中一样的代码(BiLSTM+CRF 序列标注模型))，没有使用分词特征，在人民日报一月语料库上效果相对会好一些\n\n#### 4. 关系抽取\n\n#### 5. 关键词抽取\n\u003e 实现了基于tf-idf、textrank 的关键词抽取\n#### 6. 文章摘要\n\u003e 正在准备实现基于seq2seq的，苦于没有训练语料\n\n#### 7. GAN学习\n\u003e 1. 原始GAN (GAN.py)\n\n\u003e 2. Conditional GAN (cGAN.py)\n\n\u003e 3. \n#### 7. 主题模型(不要face的写上吧)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmacanv%2Fmqnlp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmacanv%2Fmqnlp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmacanv%2Fmqnlp/lists"}