{"id":18301269,"url":"https://github.com/spider-rs/spider-py","last_synced_at":"2025-04-05T03:02:52.662Z","repository":{"id":211443936,"uuid":"729103082","full_name":"spider-rs/spider-py","owner":"spider-rs","description":"Spider ported to Python","archived":false,"fork":false,"pushed_at":"2025-01-28T14:49:37.000Z","size":1423,"stargazers_count":71,"open_issues_count":0,"forks_count":11,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-29T02:02:25.291Z","etag":null,"topics":["crawler","headless-chrome","python","scraper","spider","web-crawler"],"latest_commit_sha":null,"homepage":"https://spider-rs.github.io/spider-py/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/spider-rs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-08T12:22:24.000Z","updated_at":"2025-03-26T21:47:53.000Z","dependencies_parsed_at":"2023-12-31T18:27:32.590Z","dependency_job_id":"638d8a92-4966-446a-80bc-65e1a88d3606","html_url":"https://github.com/spider-rs/spider-py","commit_stats":null,"previous_names":["spider-rs/spider-py"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spider-rs%2Fspider-py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spider-rs%2Fspider-py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spider-rs%2Fspider-py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spider-rs%2Fspider-py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/spider-rs","download_url":"https://codeload.github.com/spider-rs/spider-py/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247280190,"owners_count":20912967,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","headless-chrome","python","scraper","spider","web-crawler"],"created_at":"2024-11-05T15:14:59.743Z","updated_at":"2025-04-05T03:02:47.651Z","avatar_url":"https://github.com/spider-rs.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# spider-py\n\nThe [spider](https://github.com/spider-rs/spider) project ported to Python.\n\n## Getting Started\n\n1. `pip install spider_rs`\n\n\n```python\nimport asyncio\n\nfrom spider_rs import Website\n\nasync def main():\n    website = Website(\"https://choosealicense.com\")\n    website.crawl()\n    print(website.get_links())\n\nasyncio.run(main())\n```\n\nView the [examples](./examples/) to learn more.\n\n## Development\n\nInstall maturin `pipx install maturin` and python.\n\n1. `maturin develop`\n\n## Benchmarks\n\nView the [benchmarks](./bench/README.md) to see a breakdown between libs and platforms.\n\nTest url: `https://espn.com`\n\n| `libraries`                  | `pages`   | `speed` |\n| :--------------------------- | :-------- | :------ |\n| **`spider(rust): crawl`**    | `150,387` | `1m`    |\n| **`spider(nodejs): crawl`**  | `150,387` | `153s`  |\n| **`spider(python): crawl`**  | `150,387` | `186s`  |\n| **`scrapy(python): crawl`**  | `49,598`  | `1h`    |\n| **`crawlee(nodejs): crawl`** | `18,779`  | `30m`   |\n\nThe benches above were ran on a mac m1, spider on linux arm machines performs about 2-10x faster.\n\n## Issues\n\nPlease submit a Github issue for any issues found.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspider-rs%2Fspider-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspider-rs%2Fspider-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspider-rs%2Fspider-py/lists"}