{"id":28167665,"url":"https://github.com/vonsteer/spiderchef","last_synced_at":"2026-02-28T10:32:33.550Z","repository":{"id":291990825,"uuid":"979390568","full_name":"vonsteer/spiderchef","owner":"vonsteer","description":"Low Code Recipe-based Web Scraping Framework","archived":false,"fork":false,"pushed_at":"2025-05-17T18:19:23.000Z","size":522,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-25T06:46:21.698Z","etag":null,"topics":["asyncio","framework","low-code","no-code","python","recipes","scraping","yaml"],"latest_commit_sha":null,"homepage":"https://spiderchef.readthedocs.io/en/latest/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vonsteer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-07T12:46:53.000Z","updated_at":"2025-05-17T18:23:01.000Z","dependencies_parsed_at":"2025-05-14T23:22:57.057Z","dependency_job_id":null,"html_url":"https://github.com/vonsteer/spiderchef","commit_stats":null,"previous_names":["vonsteer/spiderchef"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/vonsteer/spiderchef","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vonsteer%2Fspiderchef","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vonsteer%2Fspiderchef/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vonsteer%2Fspiderchef/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vonsteer%2Fspiderchef/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vonsteer","download_url":"https://codeload.github.com/vonsteer/spiderchef/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vonsteer%2Fspiderchef/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29930344,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-28T09:58:13.507Z","status":"ssl_error","status_checked_at":"2026-02-28T09:57:57.047Z","response_time":90,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asyncio","framework","low-code","no-code","python","recipes","scraping","yaml"],"created_at":"2025-05-15T14:13:13.727Z","updated_at":"2026-02-28T10:32:33.533Z","avatar_url":"https://github.com/vonsteer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Spider Chef 🕷️👨‍🍳\n[![PyPI](https://img.shields.io/pypi/v/spiderchef)](https://pypi.org/project/spiderchef/)\n[![Python Versions](https://img.shields.io/pypi/pyversions/spiderchef)](https://pypi.org/project/spiderchef/)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit\u0026logoColor=white)](https://github.com/pre-commit/pre-commit)\n[![Coverage Status](./coverage-badge.svg?dummy=8484744)](./coverage.xml)\n[![Documentation Status](https://readthedocs.org/projects/spiderchef/badge/?version=latest)](https://spiderchef.readthedocs.io/en/latest/?badge=latest)\n```\n                   /\\\n                  /  \\\n                 |  _ \\                   _\n                 | / \\ \\   .--,--        / \\\n                 |/   \\ \\  `.  ,.'      /   \\\n                 /     \\ |  |___|  /\\  /     \\\n                /|      \\|  ~  ~  /  \\/       \\\n        _______/_|_______\\ (o)(o)/___/\\_____   \\\n       /      /  |        (______)     \\    \\   \\_\n      /      /   |                      \\    \\\n     /      /    |                       \\    \\\n    /      /     |                        \\    \\\n   /     _/      |                         \\    \\\n  /             _|                          \\    \\_\n_/                                           \\_                                               \n```\n\nSpiderChef is a powerful, recipe-based web scraping tool that makes data extraction systematic and reproducible. By defining scraping procedures as \"recipes\" with sequential \"steps,\" SpiderChef allows you to craft elegant, maintainable data extraction workflows.\n\n## Features\n\n- Recipe-Based Architecture: Define extraction workflows as YAML recipes\n- Modular Step System: Build complex scraping logic from reusable components\n- Async Support: Handle both synchronous and asynchronous extraction steps\n- Type Safety: Fully typed for better development experience\n- Extensible Design: Easily create custom steps for specialized extraction needs\n\n## Installation\n\n```bash\n# If you want to use the cli\npip install spiderchef[cli]\n\n# If you just want the library usage\npip install spiderchef\n```\n\n## CLI Usage\n\n```bash\n# Run a recipe\nspiderchef cook recipes/example.yaml\n\n# Create a new recipe template\nspiderchef recipe new my_extraction\n```\n\n\n## Library Usage\n\n### Basic Usage\nThe basic usage of this library involves just pulling a local recipe and \"cooking\" it to get the output data:\n```python\nimport asyncio\nfrom spiderchef import Recipe\n\n# Imports a recipe from a yaml file locally\nrecipe = Recipe.from_yaml('recipe_example.yaml')\n# Run a recipe\nasyncio.run(recipe.cook())\n```\n## Example Recipe\n\n```yaml\nbase_url: https://example.com\nname: ProductExtractor\nsteps:\n  - type: fetch\n    name: fetch_product_page\n    page_type: text\n    path: /products\n    params:\n      category: electronics\n      sort: price_asc\n  \n  - type: regex\n    name: extract_product_urls\n    expression: '\"(\\/product\\/[^\"]+)\"'\n  \n  - type: join_base_url\n    name: format_urls\n```\n\n### Custom Usage\nLet's say you want to extend the steps available even more with your own custom ones, you can do it like so:\n```python\nimport asyncio\nfrom typing import Any\n\nfrom spiderchef import STEP_REGISTRY, AsyncStep, Recipe, SyncStep\n\n\n# You can define your own custom steps like so:\nclass HelloStep(SyncStep):\n    # .name is a reserved keyword for steps\n    person_name: str\n\n    def _execute(self, recipe: Recipe, previous_output: Any = None) -\u003e str:\n        return f\"Hello There {self.person_name}\"\n\n\n# Sync or Async is possible.\nclass SleepStep(AsyncStep):\n    sleep_time: int = 5\n\n    async def _execute(self, recipe: Recipe, previous_output: Any = None) -\u003e Any:\n        await asyncio.sleep(self.sleep_time)\n        return previous_output\n\n\nCUSTOM_STEP_REGISTRY = {**STEP_REGISTRY, \"hello\": HelloStep, \"sleep\": SleepStep}\n\n# Overrides the global step registry with your own\nRecipe.step_registry = CUSTOM_STEP_REGISTRY\n\n# You can manually initialise a recipe like so, or just use the yaml recipe.\nrecipe = Recipe(\n    base_url=\"https://example.com\",\n    name=\"Example\",\n    steps=[\n        HelloStep(name=\"Saying Hello\", person_name=\"George\"),\n        SleepStep(\n            name=\"Sleeping\",\n        ),\n    ],\n)\n\n# Run a recipe\nasyncio.run(recipe.cook())\n\n\"\"\"Output:\n2025-05-07 16:33:01 [info     ] 🥣🥄🔥 Cooking 'Example' recipe!\n2025-05-07 16:33:01 [info     ] ➡️  1. Saying Hello...         step_class=HelloStep\n2025-05-07 16:33:01 [info     ] ➡️  2. Sleeping...             step_class=SleepStep\n2025-05-07 16:33:06 [info     ] 🍞 'Example' recipe finished output='Hello There George'\n\"\"\"\n```\n\n\n## Variable Replacement\nSpiderChef supports variable replacement in your steps using the `${variable}` syntax. Variables can be defined in the Recipe and will be automatically replaced when the step is executed:\n\n```python\nrecipe = Recipe(\n    name=\"Variable Example\",\n    base_url=\"https://example.com\"\n    # Default variables\n    variables={\n        \"sort_order\": \"price_asc\",\n        \"category\": \"smartphones\"\n    },\n    steps=[\n        # Variables are replaced automatically before execution\n        FetchStep(\n            name=\"Search Products\",\n            path=\"/products\"\n            params={\n              \"category\":\"${category}\"\n              \"sort\":\"${sort_order}\"\n            }\n        )\n    ]\n)\n# Uses default variables\nawait recipe.cook()\n\n# Replace a specific variable, making any recipe extendable\nawait recipe.cook(category=\"books\")\n```\nIn YAML recipes, you can use the same syntax:\n```yaml\nname: ProductExtractor\nbase_url: https://example.com\nvariables: # these are defaults\n  category: electronics\n  sort_order: price_asc\nsteps:\n  - type: fetch\n    name: fetch_product_page\n    page_type: text\n    path: /products\n    params:\n      category: ${category}\n      sort: ${sort_order}\n```\nYou can even save variables within the recipe to be used later using the save step.\n```yaml\nname: ProductExtractor\nbase_url: https://example.com\nvariables:\n  category: electronics\n  sort_order: price_asc\nsteps:\n  - type: fetch\n    name: fetch_product_page\n    path: /products\n    params:\n      category: ${category}\n      sort: ${sort_order}\n  - type: xpath\n    name: extract_title\n    expression: //h1\n  - type: save\n    variable: title\n```\n\n## Why SpiderChef?\nTraditional web scraping often involves writing complex, difficult-to-maintain code that mixes HTTP requests, parsing, and business logic. SpiderChef separates these concerns by:\n\n- Breaking extraction into discrete, reusable steps\n- Defining workflows as declarative recipes\n- Handling common extraction patterns with built-in steps\n- Making scraping procedures reproducible and maintainable\n\nWhether you're scraping product data, monitoring prices, or extracting research information, SpiderChef helps you build structured, reliable data extraction pipelines.\n\n\n## Documentation\nFor full documentation, visit [spiderchef.readthedocs.io](https://spiderchef.readthedocs.io).\n\nThe documentation includes:\n- Getting started guide\n- User guides for basic and advanced usage\n- API reference\n- Tutorials and examples\n- Contributing guidelines\n\nTo build the documentation locally:\n```bash\nmake docs\n```\n\n## License\n\n[MIT License](LICENSE)\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvonsteer%2Fspiderchef","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvonsteer%2Fspiderchef","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvonsteer%2Fspiderchef/lists"}