{"id":37071085,"url":"https://github.com/ihandmine/aioscpy","last_synced_at":"2026-01-14T08:17:59.513Z","repository":{"id":37712975,"uuid":"473068041","full_name":"ihandmine/aioscpy","owner":"ihandmine","description":"An asyncio + aiolibs crawler  imitate scrapy framework","archived":false,"fork":false,"pushed_at":"2025-04-18T08:31:40.000Z","size":1770,"stargazers_count":119,"open_issues_count":4,"forks_count":10,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-10-05T19:46:26.129Z","etag":null,"topics":["aiohttp","asyncio","crawling","framework","loguru","python3","scrapy","scrapy-redis"],"latest_commit_sha":null,"homepage":"https://ihandmine.github.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ihandmine.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-03-23T06:47:35.000Z","updated_at":"2025-09-14T20:02:45.000Z","dependencies_parsed_at":"2023-10-23T13:31:53.937Z","dependency_job_id":"b81eae5f-e140-439f-9b38-4d0a9bbbb541","html_url":"https://github.com/ihandmine/aioscpy","commit_stats":{"total_commits":313,"total_committers":2,"mean_commits":156.5,"dds":"0.0031948881789137795","last_synced_commit":"121d8ceaa58eb343de3c3def3e35171aebb8799b"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/ihandmine/aioscpy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ihandmine%2Faioscpy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ihandmine%2Faioscpy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ihandmine%2Faioscpy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ihandmine%2Faioscpy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ihandmine","download_url":"https://codeload.github.com/ihandmine/aioscpy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ihandmine%2Faioscpy/sbom","scorecard":{"id":483014,"data":{"date":"2025-08-11","repo":{"name":"github.com/ihandmine/aioscpy","commit":"018c78c809f292766e77f43dc59123711dd88566"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":1.7,"checks":[{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'main'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":0,"reason":"13 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2023-120 / GHSA-45c4-8wx5-qw6w","Warn: Project is vulnerable to: PYSEC-2024-24 / GHSA-5h86-8mv2-jq9f","Warn: Project is vulnerable to: GHSA-5m98-qgg9-wh84","Warn: Project is vulnerable to: GHSA-7gpw-8wmc-pm8g","Warn: Project is vulnerable to: GHSA-8495-4g3g-x7pr","Warn: Project is vulnerable to: PYSEC-2024-26 / GHSA-8qpw-xqxj-h4r2","Warn: Project is vulnerable to: GHSA-9548-qrrj-x5pj","Warn: Project is vulnerable to: PYSEC-2023-246 / GHSA-gfw2-4jvh-wgfg","Warn: Project is vulnerable to: GHSA-pjjw-qhg8-p2p9","Warn: Project is vulnerable to: PYSEC-2023-250 / GHSA-q3qx-c6g2-7pw2","Warn: Project is vulnerable to: PYSEC-2023-251 / GHSA-qvrw-v9rv-5rjx","Warn: Project is vulnerable to: PYSEC-2023-45 / GHSA-24wv-mv5m-xv4h","Warn: Project is vulnerable to: PYSEC-2023-46 / GHSA-8fww-64cx-x8p5"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-19T17:04:58.020Z","repository_id":37712975,"created_at":"2025-08-19T17:04:58.020Z","updated_at":"2025-08-19T17:04:58.020Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28413748,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T05:26:33.345Z","status":"ssl_error","status_checked_at":"2026-01-14T05:21:57.251Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aiohttp","asyncio","crawling","framework","loguru","python3","scrapy","scrapy-redis"],"created_at":"2026-01-14T08:17:58.854Z","updated_at":"2026-01-14T08:17:59.508Z","avatar_url":"https://github.com/ihandmine.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\n![aioscpy](./doc/images/aioscpy.png)\n\n# Aioscpy\n\nA powerful, high-performance asynchronous web crawling and scraping framework built on Python's asyncio ecosystem.\n\nEnglish | [中文](./doc/README_ZH.md)\n\n## Overview\n\nAioscpy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It draws inspiration from Scrapy and scrapy_redis but is designed from the ground up to leverage the full power of asynchronous programming.\n\n### Key Features\n\n- **Fully Asynchronous**: Built on Python's asyncio for high-performance concurrent operations\n- **Scrapy-like API**: Familiar API for those coming from Scrapy\n- **Distributed Crawling**: Support for distributed crawling using Redis\n- **Multiple HTTP Backends**: Support for aiohttp, httpx, and requests\n- **Dynamic Variable Injection**: Powerful dependency injection system\n- **Flexible Middleware System**: Customizable request/response processing pipeline\n- **Robust Item Processing**: Pipeline for processing scraped items\n\n## Requirements\n\n- Python 3.8+\n- Works on Linux, Windows, macOS, BSD\n\n## Installation\n\n### Basic Installation\n\n```shell\npip install aioscpy\n```\n\n### With All Dependencies\n\n```shell\npip install aioscpy[all]\n```\n\n### With Specific HTTP Backends\n\n```shell\npip install aioscpy[aiohttp,httpx]\n```\n\n### Latest Version from GitHub\n\n```shell\npip install git+https://github.com/ihandmine/aioscpy\n```\n\n## Quick Start\n\n### Creating a New Project\n\n```shell\naioscpy startproject myproject\ncd myproject\n```\n\n### Creating a Spider\n\n```shell\naioscpy genspider myspider\n```\n\nThis will create a basic spider in the `spiders` directory.\n\n![tree](./doc/images/tree.png)\n\n### Example Spider\n\n```python\nfrom aioscpy.spider import Spider\n\n\nclass QuotesSpider(Spider):\n    name = 'quotes'\n    custom_settings = {\n        \"SPIDER_IDLE\": False\n    }\n    start_urls = [\n        'https://quotes.toscrape.com/tag/humor/',\n    ]\n\n    async def parse(self, response):\n        for quote in response.css('div.quote'):\n            yield {\n                'author': quote.xpath('span/small/text()').get(),\n                'text': quote.css('span.text::text').get(),\n            }\n\n        next_page = response.css('li.next a::attr(\"href\")').get()\n        if next_page is not None:\n            yield response.follow(next_page, self.parse)\n```\n\n### Creating a Single Spider Script\n\n```shell\naioscpy onespider single_quotes\n```\n\n### Advanced Spider Example\n\n```python\nfrom aioscpy.spider import Spider\nfrom anti_header import Header\nfrom pprint import pprint, pformat\n\n\nclass SingleQuotesSpider(Spider):\n    name = 'single_quotes'\n    custom_settings = {\n        \"SPIDER_IDLE\": False\n    }\n    start_urls = [\n        'https://quotes.toscrape.com/',\n    ]\n\n    async def process_request(self, request):\n        request.headers = Header(url=request.url, platform='windows', connection=True).random\n        return request\n\n    async def process_response(self, request, response):\n        if response.status in [404, 503]:\n            return request\n        return response\n\n    async def process_exception(self, request, exc):\n        raise exc\n\n    async def parse(self, response):\n        for quote in response.css('div.quote'):\n            yield {\n                'author': quote.xpath('span/small/text()').get(),\n                'text': quote.css('span.text::text').get(),\n            }\n\n        next_page = response.css('li.next a::attr(\"href\")').get()\n        if next_page is not None:\n            yield response.follow(next_page, callback=self.parse)\n\n    async def process_item(self, item):\n        self.logger.info(\"{item}\", **{'item': pformat(item)})\n\n\nif __name__ == '__main__':\n    quotes = SingleQuotesSpider()\n    quotes.start()\n```\n\n### Running Spiders\n\n```shell\n# Run a spider from a project\naioscpy crawl quotes\n\n# Run a single spider script\naioscpy runspider quotes.py\n```\n\n![run](./doc/images/run.png)\n\n### Running from Python Code\n\n```python\nfrom aioscpy.crawler import call_grace_instance\nfrom aioscpy.utils.tools import get_project_settings\n\n# Method 1: Load all spiders from a directory\ndef load_spiders_from_directory():\n    process = call_grace_instance(\"crawler_process\", get_project_settings())\n    process.load_spider(path='./spiders')\n    process.start()\n\n# Method 2: Run a specific spider by name\ndef run_specific_spider():\n    process = call_grace_instance(\"crawler_process\", get_project_settings())\n    process.crawl('myspider')\n    process.start()\n\nif __name__ == '__main__':\n    run_specific_spider()\n```\n\n## Configuration\n\nAioscpy can be configured through the `settings.py` file in your project. Here are the most important settings:\n\n### Concurrency Settings\n\n```python\n# Maximum number of concurrent items being processed\nCONCURRENT_ITEMS = 100\n\n# Maximum number of concurrent requests\nCONCURRENT_REQUESTS = 16\n\n# Maximum number of concurrent requests per domain\nCONCURRENT_REQUESTS_PER_DOMAIN = 8\n\n# Maximum number of concurrent requests per IP\nCONCURRENT_REQUESTS_PER_IP = 0\n```\n\n### Download Settings\n\n```python\n# Delay between requests (in seconds)\nDOWNLOAD_DELAY = 0\n\n# Timeout for requests (in seconds)\nDOWNLOAD_TIMEOUT = 20\n\n# Whether to randomize the download delay\nRANDOMIZE_DOWNLOAD_DELAY = True\n\n# HTTP backend to use\nDOWNLOAD_HANDLER = \"aioscpy.core.downloader.handlers.httpx.HttpxDownloadHandler\"\n# Other options:\n# DOWNLOAD_HANDLER = \"aioscpy.core.downloader.handlers.aiohttp.AioHttpDownloadHandler\"\n# DOWNLOAD_HANDLER = \"aioscpy.core.downloader.handlers.requests.RequestsDownloadHandler\"\n```\n\n### Scheduler Settings\n\n```python\n# Scheduler to use (memory-based or Redis-based)\nSCHEDULER = \"aioscpy.core.scheduler.memory.MemoryScheduler\"\n# For distributed crawling:\n# SCHEDULER = \"aioscpy.core.scheduler.redis.RedisScheduler\"\n\n# Redis connection settings (for Redis scheduler)\nREDIS_URI = \"redis://localhost:6379\"\nQUEUE_KEY = \"%(spider)s:queue\"\n```\n\n## Response API\n\nAioscpy provides a rich API for working with responses:\n\n### Extracting Data\n\n```python\n# Using CSS selectors\ntitle = response.css('title::text').get()\nall_links = response.css('a::attr(href)').getall()\n\n# Using XPath\ntitle = response.xpath('//title/text()').get()\nall_links = response.xpath('//a/@href').getall()\n```\n\n### Following Links\n\n```python\n# Follow a link\nyield response.follow('next-page.html', self.parse)\n\n# Follow a link with a callback\nyield response.follow('details.html', self.parse_details)\n\n# Follow all links matching a CSS selector\nyield from response.follow_all(css='a.product::attr(href)', callback=self.parse_product)\n```\n\n## More Commands\n\n```shell\naioscpy -h\n```\n\n## Distributed Crawling\n\nTo enable distributed crawling with Redis:\n\n1. Configure Redis in settings:\n\n```python\nSCHEDULER = \"aioscpy.core.scheduler.redis.RedisScheduler\"\nREDIS_URI = \"redis://localhost:6379\"\nQUEUE_KEY = \"%(spider)s:queue\"\n```\n\n2. Run multiple instances of your spider on different machines, all connecting to the same Redis server.\n\n## Contributing\n\nPlease submit your suggestions to the owner by creating an issue.\n\n## Thanks\n\n[aiohttp](https://github.com/aio-libs/aiohttp/)\n\n[scrapy](https://github.com/scrapy/scrapy)\n\n[loguru](https://github.com/Delgan/loguru)\n\n[httpx](https://github.com/encode/httpx)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fihandmine%2Faioscpy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fihandmine%2Faioscpy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fihandmine%2Faioscpy/lists"}