{"id":13634234,"url":"https://github.com/maxhumber/gazpacho","last_synced_at":"2025-05-15T14:08:36.907Z","repository":{"id":44430852,"uuid":"210445243","full_name":"maxhumber/gazpacho","owner":"maxhumber","description":"🥫 The simple, fast, and modern web scraping library","archived":false,"fork":false,"pushed_at":"2023-12-07T03:03:36.000Z","size":12870,"stargazers_count":769,"open_issues_count":16,"forks_count":55,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-05-14T23:17:14.418Z","etag":null,"topics":["gazpacho","scraping","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maxhumber.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"maxhumber","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2019-09-23T20:21:55.000Z","updated_at":"2025-05-13T00:00:42.000Z","dependencies_parsed_at":"2024-05-05T03:42:10.616Z","dependency_job_id":null,"html_url":"https://github.com/maxhumber/gazpacho","commit_stats":{"total_commits":127,"total_committers":16,"mean_commits":7.9375,"dds":0.2913385826771654,"last_synced_commit":"49d8258908729b67e4189a339e1b4c99dd003778"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxhumber%2Fgazpacho","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxhumber%2Fgazpacho/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxhumber%2Fgazpacho/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxhumber%2Fgazpacho/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maxhumber","download_url":"https://codeload.github.com/maxhumber/gazpacho/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254355335,"owners_count":22057354,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gazpacho","scraping","webscraping"],"created_at":"2024-08-01T23:00:59.697Z","updated_at":"2025-05-15T14:08:31.892Z","avatar_url":"https://github.com/maxhumber.png","language":"Python","funding_links":["https://github.com/sponsors/maxhumber"],"categories":["Python","Web Scraping \u0026 Crawling"],"sub_categories":[],"readme":"\u003ch3 align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/maxhumber/gazpacho/master/images/gazpacho.png\" height=\"300px\" alt=\"gazpacho\"\u003e\n\u003c/h3\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://pypi.python.org/pypi/gazpacho\"\u003e\u003cimg alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/gazpacho.svg\"\u003e\u003c/a\u003e\n\t\u003ca href=\"https://pypi.python.org/pypi/gazpacho\"\u003e\u003cimg alt=\"PyPI - Python Version\" src=\"https://img.shields.io/pypi/pyversions/gazpacho.svg\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://pepy.tech/project/gazpacho\"\u003e\u003cimg alt=\"Downloads\" src=\"https://pepy.tech/badge/gazpacho\"\u003e\u003c/a\u003e  \n\u003c/p\u003e\n\n\n\n\n## About\n\ngazpacho is a simple, fast, and modern web scraping library. The library is stable, and installed with **zero** dependencies.\n\n\n\n## Install\n\nInstall with `pip` at the command line:\n\n```\npip install -U gazpacho\n```\n\n\n\n## Quickstart\n\nGive this a try:\n\n```python\nfrom gazpacho import get, Soup\n\nurl = 'https://scrape.world/books'\nhtml = get(url)\nsoup = Soup(html)\nbooks = soup.find('div', {'class': 'book-'}, partial=True)\n\ndef parse(book):\n    name = book.find('h4').text\n    price = float(book.find('p').text[1:].split(' ')[0])\n    return name, price\n\n[parse(book) for book in books]\n```\n\n\n\n## Tutorial\n\n#### Import\n\nImport gazpacho following the convention:\n\n```python\nfrom gazpacho import get, Soup\n```\n\n\n\n#### get\n\nUse the `get` function to download raw HTML:\n\n```python\nurl = 'https://scrape.world/soup'\nhtml = get(url)\nprint(html[:50])\n# '\u003c!DOCTYPE html\u003e\\n\u003chtml lang=\"en\"\u003e\\n  \u003chead\u003e\\n    \u003cmet'\n```\n\nAdjust `get` requests with optional params and headers:\n\n```python\nget(\n    url='https://httpbin.org/anything',\n    params={'foo': 'bar', 'bar': 'baz'},\n    headers={'User-Agent': 'gazpacho'}\n)\n```\n\n\n\n#### Soup\n\nUse the `Soup` wrapper on raw html to enable parsing:\n\n```python\nsoup = Soup(html)\n```\n\nSoup objects can alternatively be initialized with the  `.get` classmethod:\n\n```python\nsoup = Soup.get(url)\n```\n\n\n\n#### .find\n\nUse the `.find` method to target and extract HTML tags:\n\n```python\nh1 = soup.find('h1')\nprint(h1)\n# \u003ch1 id=\"firstHeading\" class=\"firstHeading\" lang=\"en\"\u003eSoup\u003c/h1\u003e\n```\n\n\n\n#### attrs=\n\nUse the `attrs` argument to isolate tags that contain specific HTML element attributes:\n\n```python\nsoup.find('div', attrs={'class': 'section-'})\n```\n\n\n\n#### partial=\n\nElement attributes are partially matched by default. Turn this off by setting `partial` to `False`:  \n\n```python\nsoup.find('div', {'class': 'soup'}, partial=False)\n```\n\n\n\n#### mode=\n\nOverride the mode argument {`'auto', 'first', 'all'`} to guarantee return behaviour:\n\n```python\nprint(soup.find('span', mode='first'))\n# \u003cspan class=\"navbar-toggler-icon\"\u003e\u003c/span\u003e\nlen(soup.find('span', mode='all'))\n# 8\n```\n\n\n\n#### dir()\n\n`Soup` objects have `html`, `tag`, `attrs`, and `text` attributes:\n\n```python\ndir(h1)\n# ['attrs', 'find', 'get', 'html', 'strip', 'tag', 'text']\n```\n\nUse them accordingly:\n\n```python\nprint(h1.html)\n# '\u003ch1 id=\"firstHeading\" class=\"firstHeading\" lang=\"en\"\u003eSoup\u003c/h1\u003e'\nprint(h1.tag)\n# h1\nprint(h1.attrs)\n# {'id': 'firstHeading', 'class': 'firstHeading', 'lang': 'en'}\nprint(h1.text)\n# Soup\n```\n\n\n\n## Support\n\nIf you use gazpacho, consider adding the [![scraper: gazpacho](https://img.shields.io/badge/scraper-gazpacho-C6422C)](https://github.com/maxhumber/gazpacho) badge to your project README.md:\n\n```markdown\n[![scraper: gazpacho](https://img.shields.io/badge/scraper-gazpacho-C6422C)](https://github.com/maxhumber/gazpacho)\n```\n\n\n\n## Contribute\n\nFor feature requests or bug reports, please use [Github Issues](https://github.com/maxhumber/gazpacho/issues)\n\nFor PRs, please read the [CONTRIBUTING.md](https://github.com/maxhumber/gazpacho/blob/master/CONTRIBUTING.md) document\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxhumber%2Fgazpacho","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaxhumber%2Fgazpacho","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxhumber%2Fgazpacho/lists"}