https://github.com/gerapy/gerapyrabbitmq
Distribution Support for Scrapy & Gerapy using RabbitMQ
https://github.com/gerapy/gerapyrabbitmq
Last synced: 11 months ago
JSON representation
Distribution Support for Scrapy & Gerapy using RabbitMQ
- Host: GitHub
- URL: https://github.com/gerapy/gerapyrabbitmq
- Owner: Gerapy
- Created: 2020-07-25T12:33:59.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-11-27T15:58:16.000Z (over 2 years ago)
- Last Synced: 2025-03-22T14:34:48.641Z (12 months ago)
- Language: Python
- Size: 15.6 KB
- Stars: 9
- Watchers: 1
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project
README
# Gerapy RabbitMQ
This is a package for supporting distribution in Scrapy using RabbitMQ, also this
package is a module in [Gerapy](https://github.com/Gerapy/Gerapy).
## Installation
You can install with this command:
```shell script
pip3 install gerapy-rabbitmq
```
## Usage
Required configuration:
```python
# Use RabbitMQ for queue
SCHEDULER = "gerapy_rabbitmq.scheduler.Scheduler"
SCHEDULER_QUEUE_KEY = '%(spider)s_requests'
# RabbitMQ Connection Parameters, see https://pika.readthedocs.io/en/stable/modules/parameters.html
RABBITMQ_CONNECTION_PARAMETERS = {
'host': 'localhost'
}
# Use Redis for dupefilter
DUPEFILTER_CLASS = "gerapy_redis.dupefilter.RFPDupeFilter"
SCHEDULER_DUPEFILTER_KEY = '%(spider)s:dupefilter'
```
Optional configuration:
```python
# RabbitMQ Queue Configuration
SCHEDULER_QUEUE_DURABLE = True
SCHEDULER_QUEUE_MAX_PRIORITY = 100
SCHEDULER_QUEUE_PRIORITY_OFFSET = 30
SCHEDULER_QUEUE_FORCE_FLUSH = True
SCHEDULER_PERSIST = False
SCHEDULER_IDLE_BEFORE_CLOSE = 0
SCHEDULER_FLUSH_ON_START = False
SCHEDULER_PRE_ENQUEUE_ALL_START_REQUESTS = True
```
## More
For more detail, you can refer to [example](./example).
## RabbitMQ Preview
