An open API service indexing awesome lists of open source software.

https://github.com/ztgx/llmweb

Webpage to structured data in Rust & LLM
https://github.com/ztgx/llmweb

automation browser data-analysis gemini headless headless-chrome llama llm openai rust scraper twitter web web-scraping web3

Last synced: 5 months ago
JSON representation

Webpage to structured data in Rust & LLM

Awesome Lists containing this project

README

          

# llmweb
**Extract any webpage to structured data in Rust & LLM**

[![Version](https://img.shields.io/crates/v/llmweb)](https://crates.io/crates/llmweb)
[![Downloads](https://img.shields.io/crates/d/llmweb?logo=rust)](https://crates.io/crates/llmweb)
[![License](https://img.shields.io/crates/l/llmweb)](LICENSE)
[![Documentation](https://img.shields.io/docsrs/llmweb)](https://docs.rs/llmweb)

> [!IMPORTANT]
> ***This project is under active development and APIs may change.***

## ✨ Key Features

- **πŸ€– Schema-Driven Extraction**
- **🌐 Multi-Provider LLM Support**
- **⚑ High-Performance & Async**
- **πŸ’» Simple & Powerful CLI**
- **πŸ¦€ Rust-Powered Reliability**
- **πŸ“„ Streaming**

## Installation
Add to your `Cargo.toml`:
```toml
[dependencies]
llmweb = "0.1"
```

1. Configure API Key(different providers choose one):
```bash
export OPENAI_API_KEY="sk-your-key-here" # OpenAI
export ANTHROPIC_API_KEY="sk-ant-your-key" # Claude
export GEMINI_API_KEY="your-google-key" # Gemini
export COHERE_API_KEY="your-cohere-key" # Cohere
export GROQ_API_KEY="gsk-your-key" # Groq
export XAI_API_KEY="xai-your-key" # xAI
export DEEPSEEK_API_KEY="your-deepseek-key" # DeepSeek
# Ollama typically requires no API key for local usage
```

2. Pick the model you want to use:
```rust
let model = "gemini-2.0-flash";
```

3. Create `LlmWeb` instance with the model:
```rust
let llmweb = LlmWeb::new(model);
```

## Example - V2EX
```rust
use llmweb::LlmWeb;
use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct VXNA {
pub username: String,
pub avatar_url: String,
pub profile_url: String,
pub title: String,
pub topic_url: String,
pub topic_id: u64,
pub relative_time: String,
pub reply_count: u32,
pub last_replier: Option,
}

#[tokio::main]
async fn main() {
let schema_str = include_str!("../schemas/v2ex_schema.json");

let llmweb = LlmWeb::new("gemini-2.0-flash");
let structed_value: Vec = llmweb
.exec_from_schema_str("https://v2ex.com/go/vxna", schema_str)
.await
.unwrap();
println!("{:#?}", structed_value);
}
```

## Streaming
```rust
#[tokio::main]
async fn main() {
// Load the schema from an external file as a string.
let schema_str = include_str!("../schemas/v2ex_schema.json");
let schema: Value = serde_json::from_str(schema_str).unwrap();

let structed_value: Vec = LlmWeb::new("gemini-2.0-flash")
.stream("https://v2ex.com/go/vxna", schema)
.await
.unwrap();
println!("{:#?}", structed_value);
}
```

## Example - HN
```rust
use llmweb::LlmWeb;
use serde::{Deserialize, Serialize};

#[derive(Debug, Serialize, Deserialize)]
struct Story {
title: String,
points: f32,
by: Option,
comments_url: Option,
}

#[tokio::main]
async fn main() {
// Load the schema from an external file as a string.
let schema_str = include_str!("../schemas/hn_schema.json");

let llmweb = LlmWeb::new("gemini-2.0-flash");
eprintln!("Fetching from Hacker News and extracting stories...");

// Use the convenience method `exec_from_schema_str` which handles
// parsing the schema string internally.
let structed_value: Vec = llmweb
.exec_from_schema_str("https://news.ycombinator.com", schema_str)
.await
.unwrap();
println!("{:#?}", structed_value);
}
```

## Cli
```bash
# Run the CLI
./target/debug/llmweb-cli --schema-file schemas/hn_schema.json https://news.ycombinator.com
```

## Output
```json
[
{
"by": "sandslash",
"comments_url": "item?id=44455175",
"points": 43.0,
"title": "FranΓ§ois Chollet: The Arc Prize and How We Get to AGI [video]"
},
{
"by": "bravomartin",
"comments_url": "item?id=44479502",
"points": 24.0,
"title": "When Figma starts designing us"
},
{
"by": "tejohnso",
"comments_url": "item?id=44489797",
"points": 15.0,
"title": "New Quantum Paradox Clarifies Where Our Views of Reality Go Wrong"
},
{
"by": "ananddtyagi",
"comments_url": "item?id=44485342",
"points": 480.0,
"title": "Bitchat – A decentralized messaging app that works over Bluetooth mesh networks"
},
{
"by": "PaulHoule",
"comments_url": "item?id=44489690",
"points": 5.0,
"title": "Mercury: Ultra-Fast Language Models Based on Diffusion"
}
]
```

## Examples
More examples can be found in the [Examples](./examples/) directory.

## Schemas
More schemas can be found in the [Schemas](./schemas/) directory.

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=zTgx/llmweb&type=Date)](https://www.star-history.com/#zTgx/llmweb&Date)

## Contributing

We welcome contributions! Please see our CONTRIBUTING.md for more details on how to get started.

## License

This project is licensed under the MIT License - see the `LICENSE` file for details.