An open API service indexing awesome lists of open source software.

https://github.com/floscodes/rust-sitescraper

Scraping Websites in Rust!
https://github.com/floscodes/rust-sitescraper

Last synced: about 1 month ago
JSON representation

Scraping Websites in Rust!

Awesome Lists containing this project

README

          

[![forthebadge made-with-rust](http://ForTheBadge.com/images/badges/made-with-rust.svg)](https://www.rust-lang.org/)
# Scraping Websites! [![crates.io](https://img.shields.io/crates/v/sitescraper.svg)](https://crates.io/crates/sitescraper)

## Examples:

### Get InnerHTML:

```
let html = "

Hello World!
";

let dom = sitescraper::parse_html(html).unwrap();

let filtered_dom = dom.filter("body");

println!("{}", filtered_dom.get_inner_html());
//Output:
Hello World!

```

### Get Text:
```
let html = "

Hello World!
";

let dom = sitescraper::parse_html(html).unwrap();

let filtered_dom = dom.filter("body");

println!("{}", filtered_dom.get_text());
//Output: Hello World!
```

### Get Text from single Tags:

```
use sitescraper;

let html = "

Hello World!
";

let dom = sitescraper::parse_html(html).unwrap();

let filtered_dom = dom.filter("div");

println!("{}", filtered_dom.tag[0].get_text());
//Output: Hello World!
```

**Works also with**
```
get_inner_html()
```

### Filter by tag-name, attribute-name and attribute-value using a tuple:

```
use sitescraper;

let html = "

Hello World!
";

let dom = sitescraper::parse_html(html).unwrap();

let filtered_dom = dom.filter(("div", "id", "hello"));

println!("{}", filtered_dom.tag[0].get_text());
//Output: Hello World!
```

**Works also with a tuple consisting of two string literals**
```
let filtered_dom = dom.filter(("div", "id"));
```

### You can also leave arguments out by passing "*" or "":

```
use sitescraper;

let html = "

Hello World!
";

let dom = sitescraper::parse_html(html).unwrap();

let filtered_dom = dom.filter(("*", "id", "hello"));

println!("{}", filtered_dom.tag[0].get_text());
//Output: Hello World!
```

or

```
use sitescraper;

let html = "

Hello World!
";

let dom = sitescraper::parse_html(html).unwrap();

let filtered_dom = dom.filter(("", "", "hello"));

println!("{}", filtered_dom.tag[0].get_text());
//Output: Hello World!
```

### Get Website-Content:

```
use sitescraper;

let html = sitescraper::http::get("http://example.com/).await.unwrap();

let dom = sitescraper::parse_html(html).unwrap();

let filtered_dom = sitescraper::filter!(dom, "div");

println!("{}", filtered_dom.get_inner_html());

```