Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jenslys/skrape-py
Python SDK to easily interact with the skrape.ai API
https://github.com/jenslys/skrape-py
ai llm-scraper python-scraper scraper scraping skrape
Last synced: 3 days ago
JSON representation
Python SDK to easily interact with the skrape.ai API
- Host: GitHub
- URL: https://github.com/jenslys/skrape-py
- Owner: jenslys
- Created: 2024-12-08T12:24:07.000Z (about 1 month ago)
- Default Branch: master
- Last Pushed: 2024-12-08T12:35:49.000Z (about 1 month ago)
- Last Synced: 2024-12-08T13:26:07.987Z (about 1 month ago)
- Topics: ai, llm-scraper, python-scraper, scraper, scraping, skrape
- Language: Python
- Homepage: https://skrape.ai
- Size: 38.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# skrape-py
A Python library for easily interacting with Skrape.ai API. Define your scraping schema using Pydantic and get type-safe results.
## Features
- 🛡️ **Type-safe**: Define your schemas using Pydantic and get fully typed results
- 🚀 **Simple API**: Just define a schema and get your data
- 🔄 **Async Support**: Built with async/await for efficient scraping
- 🧩 **Minimal Dependencies**: Built on top of proven libraries like Pydantic and httpx## Installation
```bash
pip install skrape-py
```Or with Poetry:
```bash
poetry add skrape-py
```## Environment Setup
Setup your API key in `.env`:
```env
SKRAPE_API_KEY="your_api_key_here"
```Get your API key on [Skrape.ai](https://skrape.ai)
## Quick Start
```python
from skrape import Skrape
from pydantic import BaseModel
from typing import List
import os
import asyncio# Define your schema using Pydantic
class ProductSchema(BaseModel):
title: str
price: float
description: str
rating: floatasync def main():
# Use the client as an async context manager
async with Skrape(api_key=os.getenv("SKRAPE_API_KEY")) as skrape:
# Extract data
response = await skrape.extract(
"https://example.com/product",
ProductSchema,
{"render_js": True} # Enable JavaScript rendering if needed
)
# Access the extracted data
product = response.result
print(f"Product: {product.title}")
print(f"Price: ${product.price}")
# Access rate limit information
usage = response.usage
print(f"Remaining credits: {usage.remaining}")
print(f"Rate limit reset: {usage.rateLimit.reset}")# Run the async function
asyncio.run(main())
```## Schema Definition
We leverage Pydantic for defining schemas. Here's a more complex example:
```python
from typing import List, Optional
from pydantic import BaseModel, Fieldclass Review(BaseModel):
author: str
rating: float = Field(ge=0, le=5) # Rating between 0 and 5
text: str
date: str
helpful_votes: Optional[int] = Noneclass ProductDetails(BaseModel):
title: str
price: float
description: str
rating: float
reviews: List[Review]
in_stock: bool
shipping_info: Optional[str] = None
```For a comprehensive understanding of all available options and advanced schema configurations, we recommend exploring [Pydantic's documentation](https://docs.pydantic.dev/).
## API Options
When calling `extract()`, you can pass additional options:
```python
response = await skrape.extract(
url,
schema,
{
"render_js": True, # Enable JavaScript rendering
# Add other options as needed
}
)
```## Error Handling
The library provides typed exceptions for better error handling:
```python
from skrape import Skrape, SkrapeValidationError, SkrapeAPIErrorasync with Skrape(api_key=os.getenv("SKRAPE_API_KEY")) as skrape:
try:
response = await skrape.extract(url, schema)
except SkrapeValidationError as e:
# Schema validation failed
print(f"Data doesn't match schema: {e}")
except SkrapeAPIError as e:
# API request failed (e.g., rate limit exceeded, invalid API key)
print(f"API error: {e}")
```## Rate Limiting
The API response includes rate limit information that you can use to manage your requests:
```python
response = await skrape.extract(url, schema)
usage = response.usageprint(f"Remaining credits: {usage.remaining}")
print(f"Rate limit info:")
print(f" - Remaining: {usage.rateLimit.remaining}")
print(f" - Base limit: {usage.rateLimit.baseLimit}")
print(f" - Burst limit: {usage.rateLimit.burstLimit}")
print(f" - Reset at: {usage.rateLimit.reset}")
```