https://github.com/metalwarrior665/actor-rust-pagefunction
https://github.com/metalwarrior665/actor-rust-pagefunction
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/metalwarrior665/actor-rust-pagefunction
- Owner: metalwarrior665
- Created: 2023-12-08T14:43:01.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2023-12-08T23:16:55.000Z (over 1 year ago)
- Last Synced: 2024-12-30T18:32:59.983Z (4 months ago)
- Language: Rust
- Size: 37.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Example actor showcasing running a user-provided function in a static-typed compiled language.
## How does it work
1. Reads the input from disk or via Apify API
2. Extracts the `page_function` string from the input
3. Stores the `page_function` string to the disk
4. Spawns a system process using `cargo` to compile the `page_function` into a dynamic library
5. Dynamically links the library and converts the `page_function` into a regular Rust function. It must adhere to predefined input/output types.
6. The example code gets HTML from the input provided `url` and parses it into a `document` using the [Scraper](https://docs.rs/scraper/latest/scraper/) library
7. The user-provided `page_function` gets the `document` as an input parameter and returns a JSON [Value](https://docs.rs/serde_json/latest/serde_json/enum.Value.html) type using the `json` macro## Page function
Page function can use a predefined set of Rust libraries, currently only the [Scraper](https://docs.rs/scraper/latest/scraper/) library and [serde_json](https://docs.rs/serde_json/latest/serde_json/) for JSON `Value` type are provided.### TODO
But technically, thanks to dynamic compiling, we can enable users to provide a list of libraries to be used in the `page_function`.### Example page_function
```rust
use serde_json::{Value,json};
use scraper::{Html, Selector};fn selector_to_text(document: &Html, selector: &str) -> Option {
document
.select(&Selector::parse(selector).unwrap())
.next()
.map(|el| el.text().next().unwrap().into() )
}#[no_mangle]
pub fn page_function (document: &Html) -> Value {
println!("page_function starting");let title = selector_to_text(&document, "title");
println!("extracted title: {:?}", title);let header = selector_to_text(&document, "h1");
println!("extracted header: {:?}", header);let companies_using_apify = document
.select(&Selector::parse(".Logos__container").unwrap())
.next().unwrap()
.select(&Selector::parse("img").unwrap())
.map(|el| el.value().attr("alt").unwrap().to_string())
.collect::>();println!("extracted companies_using_apify: {:?}", companies_using_apify);
let output = json!({
"title": title,
"header": header,
"companies_using_apify": companies_using_apify,
});
println!("inside pageFunction output: {:?}", output);
output
}
```