Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lonexw/rust-crawler
A simple crawler, built with Rust lang.
https://github.com/lonexw/rust-crawler
Last synced: about 1 month ago
JSON representation
A simple crawler, built with Rust lang.
- Host: GitHub
- URL: https://github.com/lonexw/rust-crawler
- Owner: lonexw
- Created: 2022-07-15T06:56:55.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-02T12:56:45.000Z (11 months ago)
- Last Synced: 2024-09-25T16:10:16.919Z (5 months ago)
- Language: Rust
- Size: 30.3 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-rust-list - lonexw/rust-crawler - crawler?style=social"/> : A simple crawler, built with Rust lang. (Web Crawler)
- awesome-rust-list - lonexw/rust-crawler - crawler?style=social"/> : A simple crawler, built with Rust lang. (Web Crawler)
README
# rust-crawler
> 說明:這個項目構建階段,是我初步學習 Rust 構建項目的時期,基本很多地方都在模仿複製,僅作爲學習用途。
參考項目和學習資料來源:
- https://kerkour.com/rust-crawler-implementation
- https://github.com/mattsse/voyager
- https://kaisery.github.io/trpl-zh-cn/title-page.html
- https://course.rs/about-book.html
- https://rusty.rs/about.html
- https://github.com/rustlang-cn/async-book## 爬虫程序设计
Why use Rust?
1. Async I/O Model, best performance possible when making requests.
2. Memory-related performance.
3. Safety when parsing.
4. Associated types.
5. 并发支持
6. 学习构建 rust project业务架构示意图
- Crawler 爬虫:负责对访问的 url 列表进行管理,获取网页响应;
- Control loop:queue urls1)构建一个 CrawlerConfig 来对 Crawler 进行参数配置和初始化
- 默认值和便捷方法
- allow or disallow domain
- concurrent_requests
2)需要构建一个 Crawler 结构体来实现主体功能依赖库:
- 网络请求库:reqwest