{"id":22697864,"url":"https://github.com/alexmili/reachable","last_synced_at":"2025-08-14T08:06:46.739Z","repository":{"id":253259112,"uuid":"843000221","full_name":"AlexMili/Reachable","owner":"AlexMili","description":"Check if a URL exists and is reachable","archived":false,"fork":false,"pushed_at":"2025-08-12T08:28:35.000Z","size":101,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-12T10:24:07.641Z","etag":null,"topics":["crawler","health-check","monitoring","reachability","webscraping"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/reachable/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AlexMili.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-08-15T15:06:07.000Z","updated_at":"2025-08-12T08:28:33.000Z","dependencies_parsed_at":"2024-08-15T15:37:39.223Z","dependency_job_id":"abc9d369-2b0c-444e-be65-78df8919a83f","html_url":"https://github.com/AlexMili/Reachable","commit_stats":null,"previous_names":["alexmili/reachable"],"tags_count":20,"template":false,"template_full_name":null,"purl":"pkg:github/AlexMili/Reachable","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexMili%2FReachable","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexMili%2FReachable/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexMili%2FReachable/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexMili%2FReachable/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AlexMili","download_url":"https://codeload.github.com/AlexMili/Reachable/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexMili%2FReachable/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270245904,"owners_count":24551652,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-13T02:00:09.904Z","response_time":66,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","health-check","monitoring","reachability","webscraping"],"created_at":"2024-12-10T05:15:58.432Z","updated_at":"2025-08-14T08:06:46.706Z","avatar_url":"https://github.com/AlexMili.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"**Reachable** checks if a URL exists and is reachable.\n\n# Features\n- Use `HEAD`request instead of `GET` to save some bandwidth\n- Follow redirects\n- Handle local redirects (without full URL in `location` header)\n- Record all the URLs of the redirection chain\n- Check if redirected URL match the TLD of source URL\n- Detect Cloudflare protection\n- Avoid basic bot detectors\n    - Use randome Chrome user agent\n    - Wait between consecutive requests to the same host\n    - Include `Host` header\n    - Can use Playwright to make the request\n- Use of HTTP/2\n- Detect parking domains\n\n# Installation\nYou can install it with pip :\n```bash\npip install reachable\n```\nIf you want to use playwright:\n```bash\npip install reachable[playwright]\n```\nOr clone this repository and simply run :\n```bash\ncd reachable/\npip install -e .\n```\n\n# Usage\n\n## Simple URL\n```python\nfrom reachable import is_reachable\nresult = is_reachable(\"https://google.com\")\n```\n\nThe output will look like this:\n```json\n{\n    \"original_url\": \"https://google.com\",\n    \"final_url\": \"https://www.google.com/\",\n    \"response\": null, \n    \"status_code\": 200,\n    \"success\": true,\n    \"error_name\": null,\n    \"cloudflare_protection\": false,\n    \"redirect\": {\n        \"chain\": [\"https://www.google.com/\"],\n        \"final_url\": \"https://www.google.com/\",\n        \"tld_match\": true\n    }\n}\n```\n\n## Multiple URLs\n```python\nfrom reachable import is_reachable\nresult = is_reachable([\"https://google.com\", \"http://bing.com\"])\n```\n\nThe output will look like this:\n```json\n[\n    {\n        \"original_url\": \"https://google.com\",\n        \"final_url\": \"https://www.google.com/\",\n        \"response\": null, \n        \"status_code\": 200,\n        \"success\": true,\n        \"error_name\": null,\n        \"cloudflare_protection\": false,\n        \"redirect\": {\n            \"chain\": [\"https://www.google.com/\"],\n            \"final_url\": \"https://www.google.com/\",\n            \"tld_match\": true\n        }\n    },\n    {\n        \"original_url\": \"http://bing.com\",\n        \"final_url\": \"https://www.bing.com/?toWww=1\u0026redig=16A78C94\",\n        \"response\": null,\n        \"status_code\": 200,\n        \"success\": true,\n        \"error_name\": null,\n        \"cloudflare_protection\": false,\n        \"redirect\": {\n            \"chain\": [\"https://www.bing.com:443/?toWww=1\u0026redig=16A78C94\"],\n            \"final_url\": \"https://www.bing.com/?toWww=1\u0026redig=16A78C94\",\n            \"tld_match\": true\n        }\n    }\n]\n```\n\n## Async\n```python\nimport asyncio\nfrom reachable import is_reachable_async\n\nresult = asyncio.run(is_reachable_async(\"https://google.com\"))\n```\nor\n```python\nimport asyncio\nfrom reachable import is_reachable_async\n\nurls = [\"https://google.com\", \"https://bing.com\"]\n\ntry:\n    loop = asyncio.get_running_loop()\nexcept RuntimeError:\n    # No loop already exists so we crete one\n    loop = asyncio.new_event_loop()\n    asyncio.set_event_loop(loop)\ntry:\n    result = loop.run_until_complete(asyncio.gather(*[is_reachable_async(url) for url in urls]))\nfinally:\n    loop.close()\n```\n\n### Handling high volumes with Taskpool\n\nIf you want to process a large number of URLs (\u003e 500) you will quickly hit the limits of your hardware and/or OS because you can only open a defined number of active connections.\n\nTo bypass this problem you can use the `TaskPool` class. It uses Asyncio Semaphores to limit the number of asyncio threads running. It works by acquiring a lock when starting the worker and releasing it when done. It allows to always have a number of asyncio workers without overwhelming the OS.\n\n```python\nimport asyncio\n\nfrom reachable import is_reachable_async\nfrom reachable.client import AsyncClient\nfrom reachable.pool import TaskPool\n\n\nurls = [\"https://google.com\", \"https://bing.com\"]\n\n\nasync def worker(url, client):\n    result = await is_reachable_async(url, client=client)\n    return result\n\n\nasync def workers_builder(urls, pool_size: int = 100):\n    async with AsyncClient() as client:\n        tasks = TaskPool(workers=pool_size)\n\n        for url in urls:\n            await tasks.put(worker(url, client=client))\n\n        await tasks.join()\n\n    return tasks._results\n\n\ntry:\n    loop = asyncio.get_running_loop()\nexcept RuntimeError:\n    # No loop already exists so we crete one\n    loop = asyncio.new_event_loop()\n    asyncio.set_event_loop(loop)\n\ntry:\n    result = loop.run_until_complete(workers_builder(urls))\n    print(result)\nfinally:\n    loop.close()\n\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexmili%2Freachable","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falexmili%2Freachable","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexmili%2Freachable/lists"}