An open API service indexing awesome lists of open source software.

https://github.com/jaypyles/Scraperr

Self-hosted webscraper.
https://github.com/jaypyles/Scraperr

opensource self-hosted webscraper

Last synced: 4 months ago
JSON representation

Self-hosted webscraper.

Awesome Lists containing this project

README

          


Scraperr Logo

**A powerful self-hosted web scraping solution**


MongoDB
FastAPI
Next JS
TailwindCSS

## 📋 Overview

Scrape websites without writing a single line of code.

> 📚 **[Check out the docs](https://scraperr-docs.pages.dev)** for a comprehensive quickstart guide and detailed information.


Scraperr Main Interface

## ✨ Key Features

- **XPath-Based Extraction**: Precisely target page elements
- **Queue Management**: Submit and manage multiple scraping jobs
- **Domain Spidering**: Option to scrape all pages within the same domain
- **Custom Headers**: Add JSON headers to your scraping requests
- **Media Downloads**: Automatically download images, videos, and other media
- **Results Visualization**: View scraped data in a structured table format
- **Data Export**: Export your results in markdown and csv formats
- **Notifcation Channels**: Send completion notifcations, through various channels

## 🚀 Getting Started

### Docker

```bash
make up
```

### Helm

> Refer to the docs for helm deployment: https://scraperr-docs.pages.dev/guides/helm-deployment

## ⚖️ Legal and Ethical Guidelines

When using Scraperr, please remember to:

1. **Respect `robots.txt`**: Always check a website's `robots.txt` file to verify which pages permit scraping
2. **Terms of Service**: Adhere to each website's Terms of Service regarding data extraction
3. **Rate Limiting**: Implement reasonable delays between requests to avoid overloading servers

> **Disclaimer**: Scraperr is intended for use only on websites that explicitly permit scraping. The creator accepts no responsibility for misuse of this tool.

## 💬 Join the Community

Get support, report bugs, and chat with other users and contributors.

👉 [Join the Scraperr Discord](https://discord.gg/89q7scsGEK)

## 📄 License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

## 👏 Contributions

Development made easier with the [webapp template](https://github.com/jaypyles/webapp-template).

To get started, simply run `make build up-dev`.