{"id":13414805,"url":"https://github.com/clemfromspace/scrapy-selenium","last_synced_at":"2025-05-14T23:06:16.425Z","repository":{"id":29279660,"uuid":"121104365","full_name":"clemfromspace/scrapy-selenium","owner":"clemfromspace","description":"Scrapy middleware to handle javascript pages using selenium","archived":false,"fork":false,"pushed_at":"2024-07-08T09:45:54.000Z","size":30,"stargazers_count":942,"open_issues_count":78,"forks_count":361,"subscribers_count":20,"default_branch":"develop","last_synced_at":"2025-05-12T01:48:16.734Z","etag":null,"topics":["crawling","scrapy","selenium"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"wtfpl","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/clemfromspace.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-02-11T08:53:10.000Z","updated_at":"2025-04-24T08:30:10.000Z","dependencies_parsed_at":"2024-01-15T23:26:52.486Z","dependency_job_id":"4f0c3f69-baad-4331-ad18-f0e5c8128cfe","html_url":"https://github.com/clemfromspace/scrapy-selenium","commit_stats":{"total_commits":21,"total_committers":9,"mean_commits":"2.3333333333333335","dds":0.6666666666666667,"last_synced_commit":"2e557f6f2221f2609b5b6c6cf3b609a28387d918"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clemfromspace%2Fscrapy-selenium","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clemfromspace%2Fscrapy-selenium/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clemfromspace%2Fscrapy-selenium/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clemfromspace%2Fscrapy-selenium/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/clemfromspace","download_url":"https://codeload.github.com/clemfromspace/scrapy-selenium/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254243360,"owners_count":22038046,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawling","scrapy","selenium"],"created_at":"2024-07-30T21:00:37.019Z","updated_at":"2025-05-14T23:06:11.414Z","avatar_url":"https://github.com/clemfromspace.png","language":"Python","readme":"# Scrapy with selenium\n[![PyPI](https://img.shields.io/pypi/v/scrapy-selenium.svg)](https://pypi.python.org/pypi/scrapy-selenium) [![Build Status](https://travis-ci.org/clemfromspace/scrapy-selenium.svg?branch=master)](https://travis-ci.org/clemfromspace/scrapy-selenium) [![Test Coverage](https://api.codeclimate.com/v1/badges/5c737098dc38a835ff96/test_coverage)](https://codeclimate.com/github/clemfromspace/scrapy-selenium/test_coverage) [![Maintainability](https://api.codeclimate.com/v1/badges/5c737098dc38a835ff96/maintainability)](https://codeclimate.com/github/clemfromspace/scrapy-selenium/maintainability)\n\nScrapy middleware to handle javascript pages using selenium.\n\n## Installation\n```\n$ pip install scrapy-selenium\n```\nYou should use **python\u003e=3.6**. \nYou will also need one of the Selenium [compatible browsers](http://www.seleniumhq.org/about/platforms.jsp).\n\n## Configuration\n1. Add the browser to use, the path to the driver executable, and the arguments to pass to the executable to the scrapy settings:\n    ```python\n    from shutil import which\n\n    SELENIUM_DRIVER_NAME = 'firefox'\n    SELENIUM_DRIVER_EXECUTABLE_PATH = which('geckodriver')\n    SELENIUM_DRIVER_ARGUMENTS=['-headless']  # '--headless' if using chrome instead of firefox\n    ```\n\nOptionally, set the path to the browser executable:\n    ```python\n    SELENIUM_BROWSER_EXECUTABLE_PATH = which('firefox')\n    ```\n\nIn order to use a remote Selenium driver, specify `SELENIUM_COMMAND_EXECUTOR` instead of `SELENIUM_DRIVER_EXECUTABLE_PATH`:\n    ```python\n    SELENIUM_COMMAND_EXECUTOR = 'http://localhost:4444/wd/hub'\n    ```\n\n2. Add the `SeleniumMiddleware` to the downloader middlewares:\n    ```python\n    DOWNLOADER_MIDDLEWARES = {\n        'scrapy_selenium.SeleniumMiddleware': 800\n    }\n    ```\n## Usage\nUse the `scrapy_selenium.SeleniumRequest` instead of the scrapy built-in `Request` like below:\n```python\nfrom scrapy_selenium import SeleniumRequest\n\nyield SeleniumRequest(url=url, callback=self.parse_result)\n```\nThe request will be handled by selenium, and the request will have an additional `meta` key, named `driver` containing the selenium driver with the request processed.\n```python\ndef parse_result(self, response):\n    print(response.request.meta['driver'].title)\n```\nFor more information about the available driver methods and attributes, refer to the [selenium python documentation](http://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.remote.webdriver)\n\nThe `selector` response attribute work as usual (but contains the html processed by the selenium driver).\n```python\ndef parse_result(self, response):\n    print(response.selector.xpath('//title/@text'))\n```\n\n### Additional arguments\nThe `scrapy_selenium.SeleniumRequest` accept 4 additional arguments:\n\n#### `wait_time` / `wait_until`\n\nWhen used, selenium will perform an [Explicit wait](http://selenium-python.readthedocs.io/waits.html#explicit-waits) before returning the response to the spider.\n```python\nfrom selenium.webdriver.common.by import By\nfrom selenium.webdriver.support import expected_conditions as EC\n\nyield SeleniumRequest(\n    url=url,\n    callback=self.parse_result,\n    wait_time=10,\n    wait_until=EC.element_to_be_clickable((By.ID, 'someid'))\n)\n```\n\n#### `screenshot`\nWhen used, selenium will take a screenshot of the page and the binary data of the .png captured will be added to the response `meta`:\n```python\nyield SeleniumRequest(\n    url=url,\n    callback=self.parse_result,\n    screenshot=True\n)\n\ndef parse_result(self, response):\n    with open('image.png', 'wb') as image_file:\n        image_file.write(response.meta['screenshot'])\n```\n\n#### `script`\nWhen used, selenium will execute custom JavaScript code.\n```python\nyield SeleniumRequest(\n    url=url,\n    callback=self.parse_result,\n    script='window.scrollTo(0, document.body.scrollHeight);',\n)\n```\n","funding_links":[],"categories":["Scrapy Middleware"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclemfromspace%2Fscrapy-selenium","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fclemfromspace%2Fscrapy-selenium","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclemfromspace%2Fscrapy-selenium/lists"}