{"id":17489240,"url":"https://github.com/d4vinci/scrapling","last_synced_at":"2026-02-15T07:11:23.177Z","repository":{"id":257825114,"uuid":"872119017","full_name":"D4Vinci/Scrapling","owner":"D4Vinci","description":"🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!","archived":false,"fork":false,"pushed_at":"2025-04-22T03:17:11.000Z","size":1882,"stargazers_count":2930,"open_issues_count":2,"forks_count":190,"subscribers_count":29,"default_branch":"main","last_synced_at":"2025-04-24T06:55:31.260Z","etag":null,"topics":["ai","ai-scraping","automation","crawler","crawling","crawling-python","data","data-extraction","hacktoberfest","playwright","python","python3","scraping","selectors","stealth","web-scraper","web-scraping","web-scraping-python","webscraping","xpath"],"latest_commit_sha":null,"homepage":"https://scrapling.readthedocs.io/en/latest/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/D4Vinci.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"D4Vinci","buy_me_a_coffee":"d4vinci"}},"created_at":"2024-10-13T20:29:53.000Z","updated_at":"2025-04-24T06:36:12.000Z","dependencies_parsed_at":"2024-12-21T13:28:49.110Z","dependency_job_id":"fc48702b-4b27-485b-aa32-5a9f45bffd96","html_url":"https://github.com/D4Vinci/Scrapling","commit_stats":{"total_commits":25,"total_committers":1,"mean_commits":25.0,"dds":0.0,"last_synced_commit":"b45cb4146b4d541c90b2c47c5a19aca7323036b9"},"previous_names":["d4vinci/scrapling"],"tags_count":22,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/D4Vinci%2FScrapling","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/D4Vinci%2FScrapling/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/D4Vinci%2FScrapling/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/D4Vinci%2FScrapling/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/D4Vinci","download_url":"https://codeload.github.com/D4Vinci/Scrapling/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250831843,"owners_count":21494447,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-scraping","automation","crawler","crawling","crawling-python","data","data-extraction","hacktoberfest","playwright","python","python3","scraping","selectors","stealth","web-scraper","web-scraping","web-scraping-python","webscraping","xpath"],"created_at":"2024-10-19T05:06:19.430Z","updated_at":"2026-02-15T07:11:23.136Z","avatar_url":"https://github.com/D4Vinci.png","language":"Python","readme":"\u003cp align=center\u003e\n  \u003cbr\u003e\n  \u003ca href=\"https://scrapling.readthedocs.io/en/latest/\" target=\"_blank\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/poster.png\" style=\"width: 50%; height: 100%;\"/\u003e\u003c/a\u003e\n  \u003cbr\u003e\n  \u003ci\u003eEasy, effortless Web Scraping as it should be!\u003c/i\u003e\n  \u003cbr\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/D4Vinci/Scrapling/actions/workflows/tests.yml\" alt=\"Tests\"\u003e\n        \u003cimg alt=\"Tests\" src=\"https://github.com/D4Vinci/Scrapling/actions/workflows/tests.yml/badge.svg\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://badge.fury.io/py/Scrapling\" alt=\"PyPI version\"\u003e\n        \u003cimg alt=\"PyPI version\" src=\"https://badge.fury.io/py/Scrapling.svg\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://pepy.tech/project/scrapling\" alt=\"PyPI Downloads\"\u003e\n        \u003cimg alt=\"PyPI Downloads\" src=\"https://static.pepy.tech/badge/scrapling\"\u003e\u003c/a\u003e\n    \u003cbr/\u003e\n    \u003ca href=\"https://discord.gg/EMgGbDceNQ\" alt=\"Discord\" target=\"_blank\"\u003e\n      \u003cimg alt=\"Discord\" src=\"https://img.shields.io/discord/1360786381042880532?style=social\u0026logo=discord\u0026link=https%3A%2F%2Fdiscord.gg%2FEMgGbDceNQ\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://x.com/Scrapling_dev\" alt=\"X (formerly Twitter)\"\u003e\n      \u003cimg alt=\"X (formerly Twitter) Follow\" src=\"https://img.shields.io/twitter/follow/Scrapling_dev?style=social\u0026logo=x\u0026link=https%3A%2F%2Fx.com%2FScrapling_dev\"\u003e\n    \u003c/a\u003e\n    \u003cbr/\u003e\n    \u003ca href=\"https://pypi.org/project/scrapling/\" alt=\"Supported Python versions\"\u003e\n        \u003cimg alt=\"Supported Python versions\" src=\"https://img.shields.io/pypi/pyversions/scrapling.svg\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://scrapling.readthedocs.io/en/latest/#installation\"\u003e\n        Installation\n    \u003c/a\u003e\n    ·\n    \u003ca href=\"https://scrapling.readthedocs.io/en/latest/overview/\"\u003e\n        Overview\n    \u003c/a\u003e\n    ·\n    \u003ca href=\"https://scrapling.readthedocs.io/en/latest/parsing/selection/\"\u003e\n        Selection methods\n    \u003c/a\u003e\n    ·\n    \u003ca href=\"https://scrapling.readthedocs.io/en/latest/fetching/choosing/\"\u003e\n        Choosing a fetcher\n    \u003c/a\u003e\n    ·\n    \u003ca href=\"https://scrapling.readthedocs.io/en/latest/tutorials/migrating_from_beautifulsoup/\"\u003e\n        Migrating from Beautifulsoup\n    \u003c/a\u003e\n\u003c/p\u003e\n\nDealing with failing web scrapers due to anti-bot protections or website changes? Meet Scrapling.\n\nScrapling is a high-performance, intelligent web scraping library for Python that automatically adapts to website changes while significantly outperforming popular alternatives. For both beginners and experts, Scrapling provides powerful features while maintaining simplicity.\n\n```python\n\u003e\u003e from scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, PlayWrightFetcher\n\u003e\u003e StealthyFetcher.auto_match = True\n# Fetch websites' source under the radar!\n\u003e\u003e page = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)\n\u003e\u003e print(page.status)\n200\n\u003e\u003e products = page.css('.product', auto_save=True)  # Scrape data that survives website design changes!\n\u003e\u003e # Later, if the website structure changes, pass `auto_match=True`\n\u003e\u003e products = page.css('.product', auto_match=True)  # and Scrapling still finds them!\n```\n\n# Sponsors \n\n[Evomi](https://evomi.com?utm_source=github\u0026utm_medium=banner\u0026utm_campaign=d4vinci-scrapling) is your Swiss Quality Proxy Provider, starting at **$0.49/GB**\n\n- 👩‍💻 **$0.49 per GB Residential Proxies**: Our price is unbeatable\n- 👩‍💻 **24/7 Expert Support**: We will join your Slack Channel\n- 🌍 **Global Presence**: Available in 150+ Countries\n- ⚡ **Low Latency**\n- 🔒 **Swiss Quality and Privacy**\n- 🎁 **Free Trial**\n- 🛡️ **99.9% Uptime**\n- 🤝 **Special IP Pool selection**: Optimize for fast, quality or quantity of ips\n- 🔧 **Easy Integration**: Compatible with most software and programming languages\n\n[![Evomi Banner](https://my.evomi.com/images/brand/cta.png)](https://evomi.com?utm_source=github\u0026utm_medium=banner\u0026utm_campaign=d4vinci-scrapling)\n---\n\n[Scrapeless](http://scrapeless.com/?utm_source=D4Vinci) – Easy web scraping toolkit for businesses and developers\n\n⚡ [Scraping Browser](https://www.scrapeless.com/en/product/scraping-browser?utm_source=D4Vinci):\n1. Web browsing capabilities for AI agents and applications\n    - Collect data at scale for agents without being blocked\n    - Simulate user behavior using advanced browser tools\n    - Build agent applications with real-time and historical web data\n2. Unlock any scale with unlimited parallel jobs\n3. High-performance web unlocking built directly into the browser\n4. Compatible with Puppeteer and Playwright\n\n⚡ [Deep SerpApi](https://www.scrapeless.com/en/product/deep-serp-api?utm_source=D4Vinci): One-click Google search data monitoring, supporting 15+ SERP scenarios such as academic/Google Store/Maps, $0.1/thousand queries, 0.2s response.\nScrapeless has launched MCP Server!\n\n[How to set up Scrapeless MCP Server on Cline?](https://www.scrapeless.com/en/faq/how-to-set-up-scrapeless-mcp-server-on-cline?utm_source=D4Vinci)\u003cbr/\u003e\n[How to Set Up Scrapeless MCP Server on Cursor?](https://www.scrapeless.com/en/faq/how-to-set-up-scrapeless-mcp-server-on-cursor?utm_source=D4Vinci)\u003cbr/\u003e\n[How to Set Up Scrapeless MCP Server on Claude?](https://www.scrapeless.com/en/faq/how-to-set-up-scrapeless-mcp-server-on-claude?utm_source=D4Vinci)\n\n⚡ [Scraping API](https://www.scrapeless.com/en/product/scraping-api?utm_source=D4Vinci): Easily obtain public content such as TikTok, Shopee, Amazon, Walmart, etc. Covering structured data of 8+ vertical industries such as e-commerce/social media, ready to use. Only billed by the number of successful calls.\n\n⚡ [Universal Scraping API](https://www.scrapeless.com/en/product/universal-scraping-api?utm_source=D4Vinci): Intelligent IP rotation + real user fingerprint, success rate up to 99%. No more worrying about network blockades and crawling obstacles.\n\n⚠️ Exclusive for open source projects: Submit the Repo link to apply for 100,000 free Deep SerpApi queries!\n\n📌 [Try it now](https://app.scrapeless.com/passport/login?utm_source=D4Vinci) | [Documentation](https://docs.scrapeless.com/en/scraping-browser/quickstart/introduction/?utm_source=D4Vinci)\n\n\n[![Scrapeless Banner](https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/scrapeless.jpg)](http://scrapeless.com/?utm_source=D4Vinci)\n---\n\n## Key Features\n\n### Fetch websites as you prefer with async support\n- **HTTP Requests**: Fast and stealthy HTTP requests with the `Fetcher` class.\n- **Dynamic Loading \u0026 Automation**: Fetch dynamic websites with the `PlayWrightFetcher` class through your real browser, Scrapling's stealth mode, Playwright's Chrome browser, or [NSTbrowser](https://app.nstbrowser.io/r/1vO5e5)'s browserless!\n- **Anti-bot Protections Bypass**: Easily bypass protections with the `StealthyFetcher` and `PlayWrightFetcher` classes.\n\n### Adaptive Scraping\n- 🔄 **Smart Element Tracking**: Relocate elements after website changes using an intelligent similarity system and integrated storage.\n- 🎯 **Flexible Selection**: CSS selectors, XPath selectors, filters-based search, text search, regex search, and more.\n- 🔍 **Find Similar Elements**: Automatically locate elements similar to the element you found!\n- 🧠 **Smart Content Scraping**: Extract data from multiple websites using Scrapling's powerful features without specific selectors.\n\n### High Performance\n- 🚀 **Lightning Fast**: Built from the ground up with performance in mind, outperforming most popular Python scraping libraries.\n- 🔋 **Memory Efficient**: Optimized data structures for minimal memory footprint.\n- ⚡ **Fast JSON serialization**: 10x faster than standard library.\n\n### Developer Friendly\n- 🛠️ **Powerful Navigation API**: Easy DOM traversal in all directions.\n- 🧬 **Rich Text Processing**: All strings have built-in regex, cleaning methods, and more. All elements' attributes are optimized dictionaries with added methods that consume less memory than standard dictionaries.\n- 📝 **Auto Selectors Generation**: Generate robust short and full CSS/XPath selectors for any element.\n- 🔌 **Familiar API**: Similar to Scrapy/BeautifulSoup and the same pseudo-elements used in Scrapy.\n- 📘 **Type hints**: Complete type/doc-strings coverage for future-proofing and best autocompletion support.\n\n## Getting Started\n\n```python\nfrom scrapling.fetchers import Fetcher\n\n# Do HTTP GET request to a web page and create an Adaptor instance\npage = Fetcher.get('https://quotes.toscrape.com/', stealthy_headers=True)\n# Get all text content from all HTML tags in the page except the `script` and `style` tags\npage.get_all_text(ignore_tags=('script', 'style'))\n\n# Get all quotes elements; any of these methods will return a list of strings directly (TextHandlers)\nquotes = page.css('.quote .text::text')  # CSS selector\nquotes = page.xpath('//span[@class=\"text\"]/text()')  # XPath\nquotes = page.css('.quote').css('.text::text')  # Chained selectors\nquotes = [element.text for element in page.css('.quote .text')]  # Slower than bulk query above\n\n# Get the first quote element\nquote = page.css_first('.quote')  # same as page.css('.quote').first or page.css('.quote')[0]\n\n# Tired of selectors? Use find_all/find\n# Get all 'div' HTML tags that one of its 'class' values is 'quote'\nquotes = page.find_all('div', {'class': 'quote'})\n# Same as\nquotes = page.find_all('div', class_='quote')\nquotes = page.find_all(['div'], class_='quote')\nquotes = page.find_all(class_='quote')  # and so on...\n\n# Working with elements\nquote.html_content  # Get the Inner HTML of this element\nquote.prettify()  # Prettified version of Inner HTML above\nquote.attrib  # Get that element's attributes\nquote.path  # DOM path to element (List of all ancestors from \u003chtml\u003e tag till the element itself)\n```\nTo keep it simple, all methods can be chained on top of each other!\n\n\u003e [!NOTE]\n\u003e Check out the full documentation from [here](https://scrapling.readthedocs.io/en/latest/)\n\n## Parsing Performance\n\nScrapling isn't just powerful - it's also blazing fast. Scrapling implements many best practices, design patterns, and numerous optimizations to save fractions of seconds. All of that while focusing exclusively on parsing HTML documents.\nHere are benchmarks comparing Scrapling to popular Python libraries in two tests. \n\n### Text Extraction Speed Test (5000 nested elements).\n\nThis test consists of extracting the text content of 5000 nested div elements.\n\n\n| # |      Library      | Time (ms) | vs Scrapling | \n|---|:-----------------:|:---------:|:------------:|\n| 1 |     Scrapling     |   5.44    |     1.0x     |\n| 2 |   Parsel/Scrapy   |   5.53    |    1.017x    |\n| 3 |     Raw Lxml      |   6.76    |    1.243x    |\n| 4 |      PyQuery      |   21.96   |    4.037x    |\n| 5 |    Selectolax     |   67.12   |   12.338x    |\n| 6 |   BS4 with Lxml   |  1307.03  |   240.263x   |\n| 7 |  MechanicalSoup   |  1322.64  |   243.132x   |\n| 8 | BS4 with html5lib |  3373.75  |   620.175x   |\n\nAs you see, Scrapling is on par with Scrapy and slightly faster than Lxml, which both libraries are built on top of. These are the closest results to Scrapling. PyQuery is also built on top of Lxml, but Scrapling is four times faster.\n\n### Extraction By Text Speed Test\n\nScrapling can find elements based on its text content and find elements similar to these elements. The only known library with these two features, too, is AutoScraper.\n\nSo, we compared this to see how fast Scrapling can be in these two tasks compared to AutoScraper.\n\nHere are the results:\n\n|   Library   | Time (ms) | vs Scrapling |\n|-------------|:---------:|:------------:|\n|  Scrapling  |   2.51    |     1.0x     |\n| AutoScraper |   11.41   |    4.546x    |\n\nScrapling can find elements with more methods and returns the entire element's `Adaptor` object, not only text like AutoScraper. So, to make this test fair, both libraries will extract an element with text, find similar elements, and then extract the text content for all of them. \n\nAs you see, Scrapling is still 4.5 times faster at the same task. \n\nIf we made Scrapling extract the elements only without stopping to extract each element's text, we would get speed twice as fast as this, but as I said, to make it fair comparison a bit :smile:\n\n\u003e All benchmarks' results are an average of 100 runs. See our [benchmarks.py](https://github.com/D4Vinci/Scrapling/blob/main/benchmarks.py) for methodology and to run your comparisons.\n\n## Installation\nScrapling is a breeze to get started with. Starting from version 0.2.9, we require at least Python 3.9 to work.\n```bash\npip3 install scrapling\n```\nThen run this command to install browsers' dependencies needed to use Fetcher classes\n```bash\nscrapling install\n```\nIf you have any installation issues, please open an issue.\n\n\n## More Sponsors!\n\u003ca href=\"https://serpapi.com/?utm_source=scrapling\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/SerpApi.png\" height=\"500\" alt=\"SerpApi Banner\" \u003e\u003c/a\u003e\n\n\n## Contributing\nEverybody is invited and welcome to contribute to Scrapling. There is a lot to do!\n\nPlease read the [contributing file](https://github.com/D4Vinci/Scrapling/blob/main/CONTRIBUTING.md) before doing anything.\n\n## Disclaimer for Scrapling Project\n\u003e [!CAUTION]\n\u003e This library is provided for educational and research purposes only. By using this library, you agree to comply with local and international data scraping and privacy laws. The authors and contributors are not responsible for any misuse of this software. This library should not be used to violate the rights of others, for unethical purposes, or to use data in an unauthorized or illegal manner. Do not use it on any website unless you have permission from the website owner or within their allowed rules, such as the `robots.txt` file.\n\n## License\nThis work is licensed under BSD-3\n\n## Acknowledgments\nThis project includes code adapted from:\n- Parsel (BSD License) - Used for [translator](https://github.com/D4Vinci/Scrapling/blob/main/scrapling/translator.py) submodule\n\n## Thanks and References\n- [Daijro](https://github.com/daijro)'s brilliant work on both [BrowserForge](https://github.com/daijro/browserforge) and [Camoufox](https://github.com/daijro/camoufox)\n- [Vinyzu](https://github.com/Vinyzu)'s work on Playwright's mock on [Botright](https://github.com/Vinyzu/Botright)\n- [brotector](https://github.com/kaliiiiiiiiii/brotector)\n- [fakebrowser](https://github.com/kkoooqq/fakebrowser)\n- [rebrowser-patches](https://github.com/rebrowser/rebrowser-patches)\n\n## Known Issues\n- In the auto-matching save process, the unique properties of the first element from the selection results are the only ones that get saved. If the selector you are using selects different elements on the page in different locations, auto-matching will return the first element to you only when you relocate it later. This doesn't include combined CSS selectors (Using commas to combine more than one selector, for example), as these selectors get separated, and each selector gets executed alone.\n\n---\n\u003cdiv align=\"center\"\u003e\u003csmall\u003eDesigned \u0026 crafted with ❤️ by Karim Shoair.\u003c/small\u003e\u003c/div\u003e\u003cbr\u003e\n","funding_links":["https://github.com/sponsors/D4Vinci","https://buymeacoffee.com/d4vinci"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fd4vinci%2Fscrapling","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fd4vinci%2Fscrapling","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fd4vinci%2Fscrapling/lists"}