https://github.com/peterbencze/silene
Silene is an open source web crawler framework built upon Pyppeteer.
https://github.com/peterbencze/silene
crawler framework pypp python scraper webcrawler
Last synced: 3 months ago
JSON representation
Silene is an open source web crawler framework built upon Pyppeteer.
- Host: GitHub
- URL: https://github.com/peterbencze/silene
- Owner: peterbencze
- Created: 2020-12-09T22:27:47.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-12-18T20:02:06.000Z (over 5 years ago)
- Last Synced: 2025-10-26T17:29:06.119Z (5 months ago)
- Topics: crawler, framework, pypp, python, scraper, webcrawler
- Language: Python
- Homepage:
- Size: 65.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Silene
Silene is an open source web crawler framework built upon [Pyppeteer](https://github.com/pyppeteer/pyppeteer).
## Requirements
You must have at least [Python 3.7](https://www.python.org/downloads/) installed.
## Installation
To install the latest release run `pip install silene`.
## Quickstart guide
Each crawler must subclass the `Crawler` class and implement the abstract `configure` method. The `CrawlerConfiguration`
specifies the initial requests to make and other properties of the crawler. Once a request is processed, the appropriate
callback will be invoked. By default, in case of a successful request the
`on_response_success` callback will be executed. This is where you can interact with the page content. You can also
specify custom callbacks for your requests.
Below you can find a very simple implementation.
### Example code snippet
```python
from silene.crawl_request import CrawlRequest
from silene.crawl_response import CrawlResponse
from silene.crawler import Crawler
from silene.crawler_configuration import CrawlerConfiguration
class MyCrawler(Crawler):
def configure(self) -> CrawlerConfiguration:
return CrawlerConfiguration([CrawlRequest('https://example.com')])
def on_response_success(self, response: CrawlResponse) -> None:
# Do something with the response...
pass
```
## Development instructions
### Prerequisite
This project requires [Pipenv](https://docs.pipenv.org/) to be installed.
### Create environment
Run `pipenv install --dev` to create a new virtual environment and install the necessary packages.
### Run tests
Run `pytest` in the project root folder.
### Run tests with coverage
Run `pytest --cov=silene` in the project root folder.
## License
The source code of Silene is made available under
the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).