https://github.com/floscodes/rust-sitescraper
Scraping Websites in Rust!
https://github.com/floscodes/rust-sitescraper
Last synced: about 1 month ago
JSON representation
Scraping Websites in Rust!
- Host: GitHub
- URL: https://github.com/floscodes/rust-sitescraper
- Owner: floscodes
- Created: 2021-11-01T21:54:21.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2023-12-04T18:08:56.000Z (over 2 years ago)
- Last Synced: 2026-01-02T12:07:05.724Z (6 months ago)
- Language: Rust
- Homepage:
- Size: 34.2 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[](https://www.rust-lang.org/)
# Scraping Websites! [](https://crates.io/crates/sitescraper)
## Examples:
### Get InnerHTML:
```
let html = "
Hello World!";
let dom = sitescraper::parse_html(html).unwrap();
let filtered_dom = dom.filter("body");
println!("{}", filtered_dom.get_inner_html());
//Output: Hello World!
```
### Get Text:
```
let html = "
Hello World!";
let dom = sitescraper::parse_html(html).unwrap();
let filtered_dom = dom.filter("body");
println!("{}", filtered_dom.get_text());
//Output: Hello World!
```
### Get Text from single Tags:
```
use sitescraper;
let html = "
Hello World!";
let dom = sitescraper::parse_html(html).unwrap();
let filtered_dom = dom.filter("div");
println!("{}", filtered_dom.tag[0].get_text());
//Output: Hello World!
```
**Works also with**
```
get_inner_html()
```
### Filter by tag-name, attribute-name and attribute-value using a tuple:
```
use sitescraper;
let html = "
Hello World!";
let dom = sitescraper::parse_html(html).unwrap();
let filtered_dom = dom.filter(("div", "id", "hello"));
println!("{}", filtered_dom.tag[0].get_text());
//Output: Hello World!
```
**Works also with a tuple consisting of two string literals**
```
let filtered_dom = dom.filter(("div", "id"));
```
### You can also leave arguments out by passing "*" or "":
```
use sitescraper;
let html = "
Hello World!";
let dom = sitescraper::parse_html(html).unwrap();
let filtered_dom = dom.filter(("*", "id", "hello"));
println!("{}", filtered_dom.tag[0].get_text());
//Output: Hello World!
```
or
```
use sitescraper;
let html = "
Hello World!";
let dom = sitescraper::parse_html(html).unwrap();
let filtered_dom = dom.filter(("", "", "hello"));
println!("{}", filtered_dom.tag[0].get_text());
//Output: Hello World!
```
### Get Website-Content:
```
use sitescraper;
let html = sitescraper::http::get("http://example.com/).await.unwrap();
let dom = sitescraper::parse_html(html).unwrap();
let filtered_dom = sitescraper::filter!(dom, "div");
println!("{}", filtered_dom.get_inner_html());
```