https://github.com/atinux/llm-scraper-nuxt
Demo of a LLM Scraper with Nuxt & Cloudflare AI & Browser rendering to return structured output.
https://github.com/atinux/llm-scraper-nuxt
ai cloudflare llm-scraper nuxt nuxthub
Last synced: 7 months ago
JSON representation
Demo of a LLM Scraper with Nuxt & Cloudflare AI & Browser rendering to return structured output.
- Host: GitHub
- URL: https://github.com/atinux/llm-scraper-nuxt
- Owner: atinux
- Created: 2025-01-09T17:19:46.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-01-09T18:32:51.000Z (9 months ago)
- Last Synced: 2025-01-09T18:35:21.234Z (9 months ago)
- Topics: ai, cloudflare, llm-scraper, nuxt, nuxthub
- Language: TypeScript
- Homepage: ttps://llm-scraper.nuxt.dev
- Size: 128 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# LLM Scraper with Nuxt & Cloudflare AI/Browser
This is a demo of a LLM Scraper with Nuxt & Cloudflare AI/Browser based on [llm-scraper-worker](https://github.com/threepointone/llm-scraper-worker) and [workers-ai-provider](https://github.com/threepointone/workers-ai-provider).
## Features
- Leverage Cloudflare AI with `@cf/meta/llama-3.1-70b-instruct`
- Use the Cloudflare Browser to scrape the web
- Cache the result for 10 minutes to avoid abuse
- One click deploy on 275+ locations for free on your Cloudflare account with [NuxtHub](https://hub.nuxt.com)## Top 5 stories on Hacker News
This is a simple API endpoint that returns the top 5 stories on Hacker News as JSON on `/api/top-5-hn` & cached for 10 minutes using Cloudflare KV:
```ts
import { z } from 'zod'
import LLMScraper from 'llm-scraper-worker'
import { createWorkersAI } from 'workers-ai-provider'export default defineCachedEventHandler(async () => {
const { page } = await hubBrowser()
const workersAI = createWorkersAI({ binding: hubAI() })
const llm = workersAI('@cf/meta/llama-3.1-70b-instruct')
const scraper = new LLMScraper(llm)await page.goto('https://news.ycombinator.com')
// Define schema to extract contents into
const schema = z.object({
top: z
.array(
z.object({
title: z.string(),
points: z.number(),
by: z.string(),
commentsURL: z.string(),
}),
)
.length(5)
.describe('Top 5 stories on Hacker News'),
})const { data } = await scraper.run(page, schema, {
format: 'html',
})
return data
}, {
maxAge: 60 * 10,
})
```Visiting http://localhost:3000/api/top-5-hn will return the top 5 stories on Hacker News as JSON:
```json
{
"top": [
{
"title": "Visualizing All ISBNs",
"points": 107,
"by": "RyanShook",
"commentsURL": "https://news.ycombinator.com/item?id=42652577"
},
{
"title": "Samsung Display to Begin Mass Production of Rollable OLED for Laptops",
"points": 21,
"by": "gnabgib",
"commentsURL": "https://news.ycombinator.com/item?id=42653352"
},
{
"title": "Predictions Scorecard, 2025 January 01",
"points": 158,
"by": "timr",
"commentsURL": "https://news.ycombinator.com/item?id=42651275"
},
{
"title": "Show HN: Tetris in a PDF",
"points": 947,
"by": "ThomasRinsma",
"commentsURL": "https://news.ycombinator.com/item?id=42645218"
},
{
"title": "Bird-inspired drone uses legs to walk and jump into the air",
"points": 40,
"by": "bookofjoe",
"commentsURL": "https://news.ycombinator.com/item?id=42617825"
}
]
}
```## Setup
Make sure to install the dependencies with [pnpm](https://pnpm.io/installation#using-corepack):
```bash
pnpm install
```Make sure to link your project with your Cloudflare account & NuxtHub account to use Workers AI locally:
```bash
npx nuxthub link
```## Development Server
Start the development server on `http://localhost:3000`:
```bash
pnpm dev
```## Production
Build the application for production:
```bash
pnpm build
```## Deploy
Deploy the application on the Edge with [NuxtHub](https://hub.nuxt.com) on your Cloudflare account:
```bash
npx nuxthub deploy
```Then checkout your server logs, analaytics and more in the [NuxtHub Admin](https://admin.hub.nuxt.com).