Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cnu/scrapy-random-useragent
Scrapy Middleware to set a random User-Agent for every Request.
https://github.com/cnu/scrapy-random-useragent
Last synced: 3 months ago
JSON representation
Scrapy Middleware to set a random User-Agent for every Request.
- Host: GitHub
- URL: https://github.com/cnu/scrapy-random-useragent
- Owner: cnu
- License: mit
- Created: 2014-12-25T12:29:23.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2019-08-16T21:29:30.000Z (over 5 years ago)
- Last Synced: 2024-09-27T04:49:16.515Z (4 months ago)
- Language: Python
- Size: 12.7 KB
- Stars: 201
- Watchers: 10
- Forks: 48
- Open Issues: 9
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
- awesome - scrapy-random-useragent - Scrapy Middleware to set a random User-Agent for every Request. (Scrapy Middleware)
- awesome-scrapy - scrapy-random-useragent - Agent for every Request. (Apps / Avoid Ban)
README
Scrapy Random User-Agent
========================Does your scrapy spider get identified and blocked by servers because
you use the default user-agent or a generic one?Use this ``random_useragent`` module and set a random user-agent for
every request. You are limited only by the number of different
user-agents you set in a text file.Installing
----------Installing it is pretty simple.
.. code-block:: python
pip install scrapy-random-useragent
Usage
-----In your ``settings.py`` file, update the ``DOWNLOADER_MIDDLEWARES``
variable like this... code-block:: python
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
'random_useragent.RandomUserAgentMiddleware': 400
}This disables the default ``UserAgentMiddleware`` and enables the
``RandomUserAgentMiddleware``.Then, create a new variable ``USER_AGENT_LIST`` with the path to your
text file which has the list of all user-agents
(one user-agent per line)... code-block:: python
USER_AGENT_LIST = "/path/to/useragents.txt"
Now all the requests from your crawler will have a random user-agent
picked from the text file.