https://github.com/ztgx/llmweb
Webpage to structured data in Rust & LLM
https://github.com/ztgx/llmweb
automation browser data-analysis gemini headless headless-chrome llama llm openai rust scraper twitter web web-scraping web3
Last synced: 5 months ago
JSON representation
Webpage to structured data in Rust & LLM
- Host: GitHub
- URL: https://github.com/ztgx/llmweb
- Owner: zTgx
- License: mit
- Created: 2025-07-06T10:53:17.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-07-17T11:15:54.000Z (6 months ago)
- Last Synced: 2025-08-28T21:25:40.445Z (5 months ago)
- Topics: automation, browser, data-analysis, gemini, headless, headless-chrome, llama, llm, openai, rust, scraper, twitter, web, web-scraping, web3
- Language: Rust
- Homepage:
- Size: 183 KB
- Stars: 14
- Watchers: 0
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# llmweb
**Extract any webpage to structured data in Rust & LLM**
[](https://crates.io/crates/llmweb)
[](https://crates.io/crates/llmweb)
[](LICENSE)
[](https://docs.rs/llmweb)
> [!IMPORTANT]
> ***This project is under active development and APIs may change.***
## β¨ Key Features
- **π€ Schema-Driven Extraction**
- **π Multi-Provider LLM Support**
- **β‘ High-Performance & Async**
- **π» Simple & Powerful CLI**
- **π¦ Rust-Powered Reliability**
- **π Streaming**
## Installation
Add to your `Cargo.toml`:
```toml
[dependencies]
llmweb = "0.1"
```
1. Configure API Key(different providers choose one):
```bash
export OPENAI_API_KEY="sk-your-key-here" # OpenAI
export ANTHROPIC_API_KEY="sk-ant-your-key" # Claude
export GEMINI_API_KEY="your-google-key" # Gemini
export COHERE_API_KEY="your-cohere-key" # Cohere
export GROQ_API_KEY="gsk-your-key" # Groq
export XAI_API_KEY="xai-your-key" # xAI
export DEEPSEEK_API_KEY="your-deepseek-key" # DeepSeek
# Ollama typically requires no API key for local usage
```
2. Pick the model you want to use:
```rust
let model = "gemini-2.0-flash";
```
3. Create `LlmWeb` instance with the model:
```rust
let llmweb = LlmWeb::new(model);
```
## Example - V2EX
```rust
use llmweb::LlmWeb;
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct VXNA {
pub username: String,
pub avatar_url: String,
pub profile_url: String,
pub title: String,
pub topic_url: String,
pub topic_id: u64,
pub relative_time: String,
pub reply_count: u32,
pub last_replier: Option,
}
#[tokio::main]
async fn main() {
let schema_str = include_str!("../schemas/v2ex_schema.json");
let llmweb = LlmWeb::new("gemini-2.0-flash");
let structed_value: Vec = llmweb
.exec_from_schema_str("https://v2ex.com/go/vxna", schema_str)
.await
.unwrap();
println!("{:#?}", structed_value);
}
```
## Streaming
```rust
#[tokio::main]
async fn main() {
// Load the schema from an external file as a string.
let schema_str = include_str!("../schemas/v2ex_schema.json");
let schema: Value = serde_json::from_str(schema_str).unwrap();
let structed_value: Vec = LlmWeb::new("gemini-2.0-flash")
.stream("https://v2ex.com/go/vxna", schema)
.await
.unwrap();
println!("{:#?}", structed_value);
}
```
## Example - HN
```rust
use llmweb::LlmWeb;
use serde::{Deserialize, Serialize};
#[derive(Debug, Serialize, Deserialize)]
struct Story {
title: String,
points: f32,
by: Option,
comments_url: Option,
}
#[tokio::main]
async fn main() {
// Load the schema from an external file as a string.
let schema_str = include_str!("../schemas/hn_schema.json");
let llmweb = LlmWeb::new("gemini-2.0-flash");
eprintln!("Fetching from Hacker News and extracting stories...");
// Use the convenience method `exec_from_schema_str` which handles
// parsing the schema string internally.
let structed_value: Vec = llmweb
.exec_from_schema_str("https://news.ycombinator.com", schema_str)
.await
.unwrap();
println!("{:#?}", structed_value);
}
```
## Cli
```bash
# Run the CLI
./target/debug/llmweb-cli --schema-file schemas/hn_schema.json https://news.ycombinator.com
```
## Output
```json
[
{
"by": "sandslash",
"comments_url": "item?id=44455175",
"points": 43.0,
"title": "FranΓ§ois Chollet: The Arc Prize and How We Get to AGI [video]"
},
{
"by": "bravomartin",
"comments_url": "item?id=44479502",
"points": 24.0,
"title": "When Figma starts designing us"
},
{
"by": "tejohnso",
"comments_url": "item?id=44489797",
"points": 15.0,
"title": "New Quantum Paradox Clarifies Where Our Views of Reality Go Wrong"
},
{
"by": "ananddtyagi",
"comments_url": "item?id=44485342",
"points": 480.0,
"title": "Bitchat β A decentralized messaging app that works over Bluetooth mesh networks"
},
{
"by": "PaulHoule",
"comments_url": "item?id=44489690",
"points": 5.0,
"title": "Mercury: Ultra-Fast Language Models Based on Diffusion"
}
]
```
## Examples
More examples can be found in the [Examples](./examples/) directory.
## Schemas
More schemas can be found in the [Schemas](./schemas/) directory.
## Star History
[](https://www.star-history.com/#zTgx/llmweb&Date)
## Contributing
We welcome contributions! Please see our CONTRIBUTING.md for more details on how to get started.
## License
This project is licensed under the MIT License - see the `LICENSE` file for details.