Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/007gzs/crawl
Rust crawl
https://github.com/007gzs/crawl
Last synced: 14 days ago
JSON representation
Rust crawl
- Host: GitHub
- URL: https://github.com/007gzs/crawl
- Owner: 007gzs
- License: lgpl-2.1
- Created: 2024-01-05T03:57:09.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-10T08:26:50.000Z (about 1 year ago)
- Last Synced: 2024-12-09T23:42:23.339Z (about 1 month ago)
- Language: Rust
- Size: 31.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Crawl
Rust crawl### demo
```fn main() -> anyhow::Result<()> {
let download = Arc::new(Downloader::new(
String::from(r"data/book1"),
String::from("https://doc.rust-lang.org/book/")
));
let url = String::from("https://doc.rust-lang.org/book/index.html");
let manager = Arc::new(Mutex::new(Manager{datas:Vec::new(), added_urls:HashSet::from([url.clone()])}));
download.start_url(url, false, Arc::new(Flag::Page))?;
for _ in 0..16{
let res_arg = get_res_thread_arg(&download);
let r = Arc::clone(&manager);
thread::spawn(move || res_run(res_arg, r));
}
start_crawl(&download, 16);
download.wait_finish();
{
let m = manager.lock().unwrap();
println!("finish {}", m.datas.len());
for item in m.datas.iter(){
println!("{} {}", item.url, item.title);
}
}
Ok(())
}```