{"id":18710775,"url":"https://github.com/apify/actor-scrapy-executor","last_synced_at":"2025-11-03T16:30:08.080Z","repository":{"id":44081404,"uuid":"203380779","full_name":"apify/actor-scrapy-executor","owner":"apify","description":"Apify actor to run web spiders written in Python in the Scrapy library","archived":false,"fork":false,"pushed_at":"2022-12-11T02:38:54.000Z","size":324,"stargazers_count":10,"open_issues_count":13,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-11T22:11:25.699Z","etag":null,"topics":["apify","scrapy","scrapy-spiders"],"latest_commit_sha":null,"homepage":"https://apify.com/apify/scrapy-executor","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apify.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-08-20T13:21:20.000Z","updated_at":"2025-02-28T18:54:46.000Z","dependencies_parsed_at":"2023-01-26T13:45:13.376Z","dependency_job_id":null,"html_url":"https://github.com/apify/actor-scrapy-executor","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Factor-scrapy-executor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Factor-scrapy-executor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Factor-scrapy-executor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apify%2Factor-scrapy-executor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apify","download_url":"https://codeload.github.com/apify/actor-scrapy-executor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248560118,"owners_count":21124594,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apify","scrapy","scrapy-spiders"],"created_at":"2024-11-07T12:35:37.333Z","updated_at":"2025-11-03T16:30:08.049Z","avatar_url":"https://github.com/apify.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Scrapy Executor\n\nThis actor allows you to run web spiders written in Python\nand the [Scrapy framework](https://scrapy.org) on the [Apify](https://apify.com/) platform.\nExecuting a spider is as simple as copy-pasting your Scrapy code into the actor's input.\nFor multi-file Scrapy spiders, see the bottom of this readme.\n\nPlease note that the actor is experimental and it might change in the future.\n\n## Input configuration\n\nThe actor has the following input options:\n\n- **Scrapy code** - Paste your Python source code with Scrapy into this field.\n- **Proxy** - Optionally, select a proxy to be used by the actor,\n  in order to avoid IP address-based blocking by the target website.\n  The actor automatically executes all the Scrapy's HTTP(S) requests through the proxy.\n\n## Storing data on Apify cloud\n\nTo store your Scrapy items in Apify's [Dataset](https://apify.com/docs/storage#dataset)\nor [Key-value store](https://apify.com/docs/storage#key-value-store) cloud storages,\nyou can use the [`apify`](https://pypi.org/project/apify/) Python package.\nAll the methods are available for actors running both locally as well as on the Apify platform. \n\nFirst, import the package by adding the following command to the top of your source file:\n\n```python\nimport apify\n```\n\nTo push your scraped data to the Dataset associated with the actor run, use the `pushData()` method:\n\n```python\napify.pushData(item)\n```\n\nNote that Datasets are useful for storing large tabular results, such as a list of products from an e-commerce site.\n\nTo interact with the default Key-value store associated with the actor run,\nuse the `setValue()`, `getValue()`, and `deleteValue()` methods:\n\n```python\napify.setValue('foo.txt', 'bar')\napify.getValue('foo.txt')\napify.deleteValue('foo.txt')\n```\n\nKey-value stores are useful for storing files, e.g. screenshots, PDFs of crawler state.\n\n\n## Multi-file Scrapy spiders\n\nIf your Scrapy spider contains multiple source code or configuration files,\nor you want to configure Scrapy settings, pipelines or middlewares,\nyou can download the source code of this actor, import your files into it\nand push it to the Apify cloud for execution.\n\nBefore you start, make sure you have Python development environment set up, and [NPM](https://www.npmjs.com/package/npm)\nand [Apify CLI](https://apify.com/docs/cli) installed on your computer.\n\nHere are instructions:\n\n1. Clone the [GitHub repository](https://github.com/apifytech/actor-scrapy-executor) with the source code of this actor:\n   ```\n   git clone https://github.com/apifytech/actor-scrapy-executor\n   ```\n2. Go to the repository directory and install NPM packages:\n   ```\n   cd actor-scrapy-executor\n   npm install\n   ```\n3. Copy your spider(s) into the `actor/spiders/` directory.\n4. Make any necessary changes to files in the the `actor/` directory, including `items.py`, `middlewares.py`, `pipelines.py` or `settings.py`.\n5. Run the actor locally on your computer and test that it works:\n   ```\n   apify run\n   ```\n6. If everything works fine, upload the actor to the Apify platform, so that you can run it in the cloud:\n   ```\n   apify push\n   ```\n\nAnd that's it!\n\nIf you have any problem or anything does not work,\nplease file an [issue on GitHub](https://github.com/apifytech/actor-scrapy-executor/issues).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapify%2Factor-scrapy-executor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapify%2Factor-scrapy-executor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapify%2Factor-scrapy-executor/lists"}