{"id":29995121,"url":"https://github.com/python-testing-crawler/python-testing-crawler","last_synced_at":"2025-08-05T01:11:22.075Z","repository":{"id":41914346,"uuid":"265591257","full_name":"python-testing-crawler/python-testing-crawler","owner":"python-testing-crawler","description":"A crawler for automated functional testing of a web application","archived":false,"fork":false,"pushed_at":"2023-05-01T21:40:44.000Z","size":82,"stargazers_count":73,"open_issues_count":7,"forks_count":5,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-07-01T01:23:43.715Z","etag":null,"topics":["crawler","django","flask","python","testing"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/python-testing-crawler.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-20T14:28:31.000Z","updated_at":"2025-01-08T17:23:25.000Z","dependencies_parsed_at":"2022-08-11T21:20:47.937Z","dependency_job_id":null,"html_url":"https://github.com/python-testing-crawler/python-testing-crawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/python-testing-crawler/python-testing-crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/python-testing-crawler%2Fpython-testing-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/python-testing-crawler%2Fpython-testing-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/python-testing-crawler%2Fpython-testing-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/python-testing-crawler%2Fpython-testing-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/python-testing-crawler","download_url":"https://codeload.github.com/python-testing-crawler/python-testing-crawler/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/python-testing-crawler%2Fpython-testing-crawler/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268815429,"owners_count":24311568,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-04T02:00:09.867Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","django","flask","python","testing"],"created_at":"2025-08-05T01:04:39.446Z","updated_at":"2025-08-05T01:11:22.057Z","avatar_url":"https://github.com/python-testing-crawler.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Python Testing Crawler   :snake: :stethoscope: :spider:\n[![PyPI version](https://badge.fury.io/py/python-testing-crawler.svg)](https://badge.fury.io/py/python-testing-crawler)\n[![PyPI Supported Python Versions](https://img.shields.io/pypi/pyversions/python-testing-crawler.svg)](https://pypi.python.org/pypi/python-testing-crawler/)\n[![GitHub license](https://img.shields.io/github/license/python-testing-crawler/python-testing-crawler)](https://github.com/python-testing-crawler/python-testing-crawler/blob/master/LICENSE.txt)\n[![GitHub Actions (Tests)](https://github.com/python-testing-crawler/python-testing-crawler/workflows/Tests/badge.svg)](https://github.com/python-testing-crawler/python-testing-crawler)\n\n_A crawler for automated functional testing of a web application_\n\nCrawling a server-side-rendered web application is a _low cost_ way to get _low quality_ test coverage of your JavaScript-light web application.\n\nIf you have only partial test coverage of your routes, but still want to protect against silly mistakes, then this is for you. \n\nFeatures:\n\n* Selectively spider pages and resources, or just request them\n* Submit forms, and control what values to send\n* Ignore links by source using CSS selectors\n* Fail fast or collect many errors\n* Configurable using straightforward rules\n\nWorks with the test clients for [Flask](https://flask.palletsprojects.com/en/1.1.x/testing/) (inc [Flask-WebTest](https://flask-webtest.readthedocs.io/en/latest/)), [Django](https://docs.djangoproject.com/en/3.0/topics/testing/tools/) and [WebTest](https://docs.pylonsproject.org/projects/webtest/en/latest/).\n\n## Why should I use this?\n\nHere's an example: [_Flaskr_](https://flask.palletsprojects.com/en/1.1.x/tutorial/), the Flask tutorial application has [166 lines of test code](https://github.com/pallets/flask/tree/master/examples/tutorial/tests) to achieve 100% test coverage.\n\n[Using Python Testing Crawler](https://github.com/python-testing-crawler/flaskr/blob/master/tests/test_crawl.py) in a similar way to the Usage example below, we can hit 73% with very little effort. Disclaimer: Of course! It's not the same quality or utility of testing! But it is better than no tests, a complement to hand-written unit or functional tests and a useful stopgap.\n\n## Installation\n\n```\n$ pip install python-testing-crawler\n```\n\n## Usage\n\nCreate a crawler using your framework's existing test client, tell it where to start and what rules to obey, then set it off:\n\n```python\nfrom python_testing_crawler import Crawler\nfrom python_testing_crawler import Rule, Request, Ignore, Allow\n\ndef test_crawl_all():\n    client = ## ... existing testing client\n    ## ... any setup ...\n    crawler = Crawler(\n        client=my_testing_client,\n        initial_paths=['/'],\n        rules=[\n            Rule(\"a\", '/.*', \"GET\", Request()),\n        ]\n    )\n    crawler.crawl()\n```\n\nThis will crawl all anchor links to relative addresses beginning \"/\". Any exceptions encountered will be collected and presented at the end of the crawl. For **more power** see the Rules section below.\n\nIf you need to authorise the client's session, e.g. login, then you should that before creating the Crawler.\n\nIt is also a good idea to create enough data, via fixtures or otherwise, to expose enough endpoints.\n\n### How do I setup a test client?\n\nIt depends on your framework:\n\n* Flask: https://flask.palletsprojects.com/en/1.1.x/testing/\n* Django: https://docs.djangoproject.com/en/3.0/topics/testing/tools/\n\n## Crawler Options\n\n| Param | Description |\n| --- | --- |\n| `initial_paths` |  list of paths/URLs to start from\n| `rules` | list of Rules to control the crawler; see below\n| `path_attrs` | list of attribute names to extract paths/URLs from; defaults to \"href\" -- include \"src\" if you want to check e.g. `\u003clink\u003e`, `\u003cscript\u003e` or even `\u003cimg\u003e`\n| `ignore_css_selectors` | any elements matching this list of CSS selectors will be ignored when extracting links\n| `ignore_form_fields` | list of form input names to ignore when determining the identity/uniqueness of a form. Include CSRF token field names here.\n| `max_requests` | Crawler will raise an exception if this limit is exceeded\n| `capture_exceptions` | upon encountering an exception, keep going and fail at the end of the crawl instead of during (default `True`)\n| `output_summary` | print summary statistics and any captured exceptions and tracebacks at the end of the crawl (default `True`)\n| `should_process_handlers` | list of \"should process\" handlers; see Handlers section\n| `check_response_handlers` | list of \"check response\" handlers; see Handlers section\n\n## Rules\n\nThe crawler has to be told what URLs to follow, what forms to post and what to ignore, using Rules.\n\nRules are made of four parameters:\n\n```Rule(\u003csource element regex\u003e, \u003ctarget URL/path regex\u003e, \u003cHTTP method\u003e, \u003caction to take\u003e)```\n\nThese are matched against every HTML element that the crawler encounters, with the last matching rule winning.\n\nActions must be one of the following objects:\n\n1. `Request(only=False, params=None)` -- follow a link or submit a form\n    - `only=True` will retrieve a page/resource but _not_ spider its links.\n    -  the dict `params` allows you to specify _overrides_ for a form's default values\n1. `Ignore()` -- do nothing / skip\n1. `Allow(status_codes)` -- allow a HTTP status in the supplied list, i.e. do not consider it an error.\n\n\n### Example Rules\n\n#### Follow all local/relative links\n\n```python\nHYPERLINKS_ONLY_RULE_SET = [\n    Rule('a', '/.*', 'GET', Request()),\n    Rule('area', '/.*', 'GET', Request()),\n]\n```\n\n#### Request but do not spider all links\n\n```python\nREQUEST_ONLY_EXTERNAL_RULE_SET = [\n    Rule('a', '.*', 'GET', Request(only=True)),\n    Rule('area', '.*', 'GET', Request(only=True)),\n]\n```\n\nThis is useful for finding broken links.  You can also check `\u003clink\u003e` tags from the `\u003chead\u003e` if you include the following rule _plus_ set the Crawler's `path_attrs` to `(\"HREF\", \"SRC\")`.\n\n```Rule('link', '.*', 'GET', Request())```\n\n#### Submit forms with GET or POST\n\n```python\nSUBMIT_GET_FORMS_RULE_SET = [\n    Rule('form', '.*', 'GET', Request())\n]\n\nSUBMIT_POST_FORMS_RULE_SET = [\n    Rule('form', '.*', 'POST', Request())\n]\n```\n\nForms are submitted with their default values, unless overridden using `Request(params={...})` for a specific form target or excluded using (globally) using the `ignore_form_fields` parameter to `Crawler` (necessary for e.g. CSRF token fields).\n\n#### Allow some routes to fail\n\n```python\nPERMISSIVE_RULE_SET = [\n    Rule('.*', '.*', 'GET', Allow([*range(400, 600)])),\n    Rule('.*', '.*', 'POST', Allow([*range(400, 600)]))\n]\n```\n\nIf any HTTP error (400-599) is encountered for any request, allow it; do not error.\n\n## Crawl Graph\n\nThe crawler builds up a graph of your web application. It can be interrogated via `crawler.graph` when the crawl is finished.\n\nSee [the graph module](python_testing_crawler/graph.py) for the defintion of `Node` objects.\n\n## Handlers\n\nTwo hooks points are provided. These operate on `Node` objects (see above).\n\n### Whether to process a Node\n\nUsing `should_process_handlers`, you can register functions that take a `Node` and return a `bool` of whether the Crawler should \"process\" -- follow a link or submit a form -- or not.\n\n### Whether a response is acceptable\n\nUsing `check_response_handlers`, you can register functions that take a `Node` and response object (specific to your test client) and return a bool of whether the response should constitute an error.\n\nIf your function returns `True`, the Crawler with throw an exception.\n\n## Examples\n\nThere are currently Flask and Django examples in [the tests](tests/).\n\nSee https://github.com/python-testing-crawler/flaskr for an example of integrating into an existing application, using Flaskr, the Flask tutorial application.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpython-testing-crawler%2Fpython-testing-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpython-testing-crawler%2Fpython-testing-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpython-testing-crawler%2Fpython-testing-crawler/lists"}