https://github.com/garthtb/bcc_freq_spider
BCC语料库的词频爬虫
https://github.com/garthtb/bcc_freq_spider
Last synced: 2 months ago
JSON representation
BCC语料库的词频爬虫
- Host: GitHub
- URL: https://github.com/garthtb/bcc_freq_spider
- Owner: GarthTB
- Created: 2024-07-26T13:12:49.000Z (10 months ago)
- Default Branch: master
- Last Pushed: 2024-08-28T16:31:53.000Z (9 months ago)
- Last Synced: 2025-01-22T08:17:23.573Z (4 months ago)
- Language: Rust
- Homepage:
- Size: 4.88 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# [BCC语料库](https://bcc.blcu.edu.cn/)词频爬虫
语料文件须为UTF-8编码。每行为一个搜索项,理论上搜什么都可以。以搜索到的结果数为词频。
未找到BCC语料库的许可证,慎用!
## 需要用户输入的参数:
1. 语料文件路径
2. 并发数(默认为8,建议不超过10)
3. 网页超时(默认为30秒)### [相同功能、依赖.NET6运行时的C#版](https://github.com/GarthTB/BCCFreqSpider)