{"id":45986990,"url":"https://github.com/xcrap-cloud/xcrap-python","last_synced_at":"2026-03-01T20:00:53.873Z","repository":{"id":340219095,"uuid":"1164524754","full_name":"Xcrap-Cloud/xcrap-python","owner":"Xcrap-Cloud","description":"A modern, declarative, and modular web scraping framework for Python.","archived":false,"fork":false,"pushed_at":"2026-02-24T06:05:13.000Z","size":97,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-02-28T19:55:10.995Z","etag":null,"topics":["alternative","declarative","framework","modular","python","scrapint","scrapy","web"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/xcrap/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Xcrap-Cloud.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-23T07:20:47.000Z","updated_at":"2026-02-24T06:05:16.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Xcrap-Cloud/xcrap-python","commit_stats":null,"previous_names":["xcrap-cloud/xcrap-python"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Xcrap-Cloud/xcrap-python","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Xcrap-Cloud%2Fxcrap-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Xcrap-Cloud%2Fxcrap-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Xcrap-Cloud%2Fxcrap-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Xcrap-Cloud%2Fxcrap-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Xcrap-Cloud","download_url":"https://codeload.github.com/Xcrap-Cloud/xcrap-python/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Xcrap-Cloud%2Fxcrap-python/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29983122,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T16:35:47.903Z","status":"ssl_error","status_checked_at":"2026-03-01T16:35:44.899Z","response_time":124,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alternative","declarative","framework","modular","python","scrapint","scrapy","web"],"created_at":"2026-02-28T19:06:13.840Z","updated_at":"2026-03-01T20:00:53.861Z","avatar_url":"https://github.com/Xcrap-Cloud.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Xcrap for Python\n\nXcrap é um framework originalmente feito para Node.js, mas, digamos que eu também sou um desenvolvedor Python e, estava um pouco entediado; por isso, resolvi fazer uma versão para Python.\n\nAinda está em fase experimental, e muito incompleto, só vou garantir o nome no PyPI, o resto, vou fazendo aos poucos. Talvez nos tornemos uma alternativa ao Scrapy, seria ambicioso demais da minha parte? Não sei, mas, vamos tentar, sou meio doido... (me contratem, Zyte :v)\n\n## Como funciona (O que já temos)\n\nA ideia é ser declarativo e fácil. Veja um exemplo de como você já pode usar o `xcrap` para extrair dados estruturados:\n\n```python\nfrom xcrap.extractor import HtmlExtractionModel, HtmlBaseField, HtmlNestedField, css\nfrom xcrap.clients import HttpxClient\nimport asyncio\n\n# 1. Defina seus modelos de forma declarativa\nclass AuthorModel(HtmlExtractionModel):\n    name = HtmlBaseField(\n        query = css(\"small.author::text\")\n    )\n    link = HtmlBaseField(\n        query = css(\"a::attr(href)\")\n    )\n\nclass QuoteModel(HtmlExtractionModel):\n    text = HtmlBaseField(\n        query = css(\"span.text::text\")\n    )\n    # Modelos aninhados sem esforço!\n    author = HtmlNestedField(\n        model = AuthorModel\n    ) \n    tags = HtmlBaseField(\n        query = css(\"div.tags a.tag::text\"),\n        multiple = True\n    )\n\nclass QuotesPageModel(HtmlExtractionModel):\n    quotes = HtmlNestedField(\n        query = css(\"div.quote\"),\n        model = QuoteModel,\n        multiple = True\n    )\n\nasync def main():\n    client = HttpxClient()\n    \n    # 2. Busque a página (com suporte a retries, proxy, etc.)\n    response = await client.fetch(url=\"http://quotes.toscrape.com\")\n\n    # 3. Transforme em um parser e extraia os dados\n    parser = response.as_html_parser()\n    data = parser.extract_model(QuotesPageModel)\n\n    for quote in data[\"quotes\"][:3]:\n        print(f\"Quote: {quote['text']}\")\n        print(f\"Author: {quote['author']['name']}\")\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n**Saída:**\n\n```\nFetching http://quotes.toscrape.com ...\n\nExtracted Data:\n\nQuote 1:\n  Text: “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”\n  Author: {'name': 'Albert Einstein', 'link': '/author/Albert-Einstein'}\n  Tags: change, deep-thoughts, thinking, world\n\nQuote 2:\n  Text: “It is our choices, Harry, that show what we truly are, far more than our abilities.”\n  Author: {'name': 'J.K. Rowling', 'link': '/author/J-K-Rowling'}\n  Tags: abilities, choices\n\nQuote 3:\n  Text: “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”\n  Author: {'name': 'Albert Einstein', 'link': '/author/Albert-Einstein'}\n  Tags: inspirational, life, live, miracle, miracles\n\nFull test passed successfully!\n```\n\nEu não sou iniciante em web scraping, mas não posso dizer que sou um especialista também, não enfrentei muitos casos; então, peço que, se você souber de algo que eu não sei e puder me ajudar, que me ajude!\n\nO objetivo do Xcrap é ser modular, fácil de plugar com outros clientes Http (e usar até mesmo navegadores via Selenium ou seja qual lá biblioteca existir par isso), tratar JSON, HTML, Markdown (podendo lidar bem com um documento que tenha inclusive os 3 formatos sem problemas) de forma declarativa.\n\nQuero fazer um transformador de dados, mas, até o momento, não consegui fazer essa façanha nem no Node.js, que eu já tenho um ecossitema maior do Xcrap.\n\nTambém sou um tanto quanto leigo em testes, então, se puder me ajudar com isso, eu agradeço!\n\nEnfim, estamos aceitando contribuições, precismos documentar tudo isso, e muito mais! :D\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxcrap-cloud%2Fxcrap-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxcrap-cloud%2Fxcrap-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxcrap-cloud%2Fxcrap-python/lists"}