https://github.com/hantang/data-corpus
语料数据和词库收集:中文、英文停用词,情感分析,分类词典,敏感词库(违禁词,审查词)。stop words, sentiment analysis, thesaurus, censorship/sensitive word
https://github.com/hantang/data-corpus
corpus nlp stopwords thesaurus
Last synced: 13 days ago
JSON representation
语料数据和词库收集:中文、英文停用词,情感分析,分类词典,敏感词库(违禁词,审查词)。stop words, sentiment analysis, thesaurus, censorship/sensitive word
- Host: GitHub
- URL: https://github.com/hantang/data-corpus
- Owner: hantang
- Created: 2024-07-24T01:21:37.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-12T08:48:59.000Z (5 months ago)
- Last Synced: 2025-05-12T10:00:54.969Z (5 months ago)
- Topics: corpus, nlp, stopwords, thesaurus
- Homepage:
- Size: 103 MB
- Stars: 17
- Watchers: 1
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project
README
# corpus
NLP 词库汇集 (备份):
- 多种语言的停用词(stop words),
- 情感分析(sentiment),
- 分类词典 (thesaurus),
- 审查词库(敏感词/违禁词,censorship/sensitive words)
- ……来源详见:[CHANGELOG](./CHANGELOG.md)