https://github.com/ozanmakes/scrapedown
A simple worker for extracting page content for a given URL
https://github.com/ozanmakes/scrapedown
Last synced: 3 months ago
JSON representation
A simple worker for extracting page content for a given URL
- Host: GitHub
- URL: https://github.com/ozanmakes/scrapedown
- Owner: ozanmakes
- License: mit
- Created: 2023-11-17T10:32:53.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-18T09:36:50.000Z (about 1 year ago)
- Last Synced: 2024-08-13T07:03:27.133Z (6 months ago)
- Language: JavaScript
- Size: 42 KB
- Stars: 88
- Watchers: 4
- Forks: 36
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - ozanmakes/scrapedown - A simple worker for extracting page content for a given URL (JavaScript)
README
# scrapedown
[data:image/s3,"s3://crabby-images/23f4c/23f4c8bb9a8583c45837c535b23c2f51ce7843d9" alt="Deploy to Cloudflare Workers"](https://deploy.workers.cloudflare.com/?url=https://github.com/osener/scrapedown)
This project is a Cloudflare worker designed to scrape web pages and extract useful information, including a markdown-formatted version of the content. It's built to handle requests to scrape a given URL and return structured data about the page.
## Features
- Fetch and scrape content from any given URL.
- Extract metadata such as title, byline, excerpt, and more.
- Convert HTML content to clean markdown format.
- Handle requests with optional markdown formatting.
- Remove everything but the content (Reader Mode)## Usage
To use this worker, send a GET request to the worker's endpoint with the `url` query parameter specifying the page to be scraped. Optionally, you can include the `markdown` query parameter to specify whether the content should be returned in markdown format (default: `true`).
e### Example Request
```
GET https://.workers.dev/?url=https://example.com&markdown=true
```### Example Response
```json
{
"page": {
"byline": "Author Name",
"content": "... stripped html content ...",
"dir": null,
"excerpt": "..."
"lang": null,
"length": 191,
"siteName": null,
"textContent": "... markdown content ...",
"title": "Example Domain"
}
}
```## Deployment
To deploy this Cloudflare worker, you have two options:
1. Use Wrangler CLI:
```sh
npx wrangler deploy
```2. Click the "Deploy to Cloudflare Workers" button at the top of this README.
## Deployment with Docker
Run in a docker container by first building the image and then running the container.
Run the commands below from the project root.
```
docker compose -f docker-compose-dev.yaml build
docker compose -f docker-compose-dev.yaml up -d
```Modifications to your running container can be made in the `docker-compose-dev.yaml`.
### Example usage with Docker
```
GET http://:8787/?url=https://example.com&markdown=true
```