https://github.com/garthtb/bccfreqspider
BCC语料库的词频爬虫
https://github.com/garthtb/bccfreqspider
Last synced: 2 months ago
JSON representation
BCC语料库的词频爬虫
- Host: GitHub
- URL: https://github.com/garthtb/bccfreqspider
- Owner: GarthTB
- License: agpl-3.0
- Created: 2024-06-18T19:37:00.000Z (11 months ago)
- Default Branch: master
- Last Pushed: 2024-06-20T10:37:30.000Z (11 months ago)
- Last Synced: 2025-01-22T08:17:25.112Z (4 months ago)
- Language: C#
- Size: 41 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# [BCC语料库](https://bcc.blcu.edu.cn/)词频爬虫
语料文件须为UTF-8编码。每行为一个搜索项,理论上搜什么都可以。以搜索到的结果数为词频。软件依赖NET6运行时。
未找到BCC语料库的许可证,慎用!
## 控制台参数:
1. 语料文件路径
2. 并发数(默认为1,建议不超过10)
3. 网页超时(默认为30秒)