{"id":37240182,"url":"https://github.com/wention/BeautifulSoup4","last_synced_at":"2026-01-22T11:01:31.671Z","repository":{"id":29495746,"uuid":"33033462","full_name":"wention/BeautifulSoup4","owner":"wention","description":"git mirror for Beautiful Soup 4.3.2","archived":true,"fork":false,"pushed_at":"2022-11-08T14:44:40.000Z","size":244,"stargazers_count":204,"open_issues_count":3,"forks_count":59,"subscribers_count":4,"default_branch":"master","last_synced_at":"2026-01-19T17:35:21.307Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wention.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-03-28T12:58:24.000Z","updated_at":"2026-01-16T16:14:25.000Z","dependencies_parsed_at":"2023-01-14T15:03:38.217Z","dependency_job_id":null,"html_url":"https://github.com/wention/BeautifulSoup4","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/wention/BeautifulSoup4","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wention%2FBeautifulSoup4","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wention%2FBeautifulSoup4/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wention%2FBeautifulSoup4/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wention%2FBeautifulSoup4/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wention","download_url":"https://codeload.github.com/wention/BeautifulSoup4/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wention%2FBeautifulSoup4/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28633747,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-21T04:47:28.174Z","status":"ssl_error","status_checked_at":"2026-01-21T04:47:22.943Z","response_time":86,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-15T07:00:29.134Z","updated_at":"2026-01-22T11:01:31.666Z","avatar_url":"https://github.com/wention.png","language":"Python","funding_links":[],"categories":["📚 فهرست"],"sub_categories":["وب اسکرپینگ"],"readme":"Beautiful Soup Documentation\n============================\n\n[Beautiful Soup](http://www.crummy.com/software/BeautifulSoup/) is a\nPython library for pulling data out of HTML and XML files. It works\nwith your favorite parser to provide idiomatic ways of navigating,\nsearching, and modifying the parse tree. It commonly saves programmers\nhours or days of work.\n\nQuick Start\n===========\n\nHere's an HTML document I'll be using as an example throughout this\ndocument. It's part of a story from `Alice in Wonderland`::\n\n    html_doc = \"\"\"\n    \u003chtml\u003e\u003chead\u003e\u003ctitle\u003eThe Dormouse's story\u003c/title\u003e\u003c/head\u003e\n    \u003cbody\u003e\n    \u003cp class=\"title\"\u003e\u003cb\u003eThe Dormouse's story\u003c/b\u003e\u003c/p\u003e\n\n    \u003cp class=\"story\"\u003eOnce upon a time there were three little sisters; and their names were\n    \u003ca href=\"http://example.com/elsie\" class=\"sister\" id=\"link1\"\u003eElsie\u003c/a\u003e,\n    \u003ca href=\"http://example.com/lacie\" class=\"sister\" id=\"link2\"\u003eLacie\u003c/a\u003e and\n    \u003ca href=\"http://example.com/tillie\" class=\"sister\" id=\"link3\"\u003eTillie\u003c/a\u003e;\n    and they lived at the bottom of a well.\u003c/p\u003e\n\n    \u003cp class=\"story\"\u003e...\u003c/p\u003e\n    \"\"\"\n\nRunning the \"three sisters\" document through Beautiful Soup gives us a\n``BeautifulSoup`` object, which represents the document as a nested\ndata structure::\n\n    from bs4 import BeautifulSoup\n    soup = BeautifulSoup(html_doc)\n\n    print(soup.prettify())\n    # \u003chtml\u003e\n    #  \u003chead\u003e\n    #   \u003ctitle\u003e\n    #    The Dormouse's story\n    #   \u003c/title\u003e\n    #  \u003c/head\u003e\n    #  \u003cbody\u003e\n    #   \u003cp class=\"title\"\u003e\n    #    \u003cb\u003e\n    #     The Dormouse's story\n    #    \u003c/b\u003e\n    #   \u003c/p\u003e\n    #   \u003cp class=\"story\"\u003e\n    #    Once upon a time there were three little sisters; and their names were\n    #    \u003ca class=\"sister\" href=\"http://example.com/elsie\" id=\"link1\"\u003e\n    #     Elsie\n    #    \u003c/a\u003e\n    #    ,\n    #    \u003ca class=\"sister\" href=\"http://example.com/lacie\" id=\"link2\"\u003e\n    #     Lacie\n    #    \u003c/a\u003e\n    #    and\n    #    \u003ca class=\"sister\" href=\"http://example.com/tillie\" id=\"link2\"\u003e\n    #     Tillie\n    #    \u003c/a\u003e\n    #    ; and they lived at the bottom of a well.\n    #   \u003c/p\u003e\n    #   \u003cp class=\"story\"\u003e\n    #    ...\n    #   \u003c/p\u003e\n    #  \u003c/body\u003e\n    # \u003c/html\u003e\n\nHere are some simple ways to navigate that data structure::\n\n    soup.title\n    # \u003ctitle\u003eThe Dormouse's story\u003c/title\u003e\n\n    soup.title.name\n    # u'title'\n\n    soup.title.string\n    # u'The Dormouse's story'\n\n    soup.title.parent.name\n    # u'head'\n\n    soup.p\n    # \u003cp class=\"title\"\u003e\u003cb\u003eThe Dormouse's story\u003c/b\u003e\u003c/p\u003e\n\n    soup.p['class']\n    # u'title'\n\n    soup.a\n    # \u003ca class=\"sister\" href=\"http://example.com/elsie\" id=\"link1\"\u003eElsie\u003c/a\u003e\n\n    soup.find_all('a')\n    # [\u003ca class=\"sister\" href=\"http://example.com/elsie\" id=\"link1\"\u003eElsie\u003c/a\u003e,\n    #  \u003ca class=\"sister\" href=\"http://example.com/lacie\" id=\"link2\"\u003eLacie\u003c/a\u003e,\n    #  \u003ca class=\"sister\" href=\"http://example.com/tillie\" id=\"link3\"\u003eTillie\u003c/a\u003e]\n\n    soup.find(id=\"link3\")\n    # \u003ca class=\"sister\" href=\"http://example.com/tillie\" id=\"link3\"\u003eTillie\u003c/a\u003e\n\nOne common task is extracting all the URLs found within a page's \u003ca\u003e tags::\n\n    for link in soup.find_all('a'):\n        print(link.get('href'))\n    # http://example.com/elsie\n    # http://example.com/lacie\n    # http://example.com/tillie\n\nAnother common task is extracting all the text from a page::\n\n    print(soup.get_text())\n    # The Dormouse's story\n    #\n    # The Dormouse's story\n    #\n    # Once upon a time there were three little sisters; and their names were\n    # Elsie,\n    # Lacie and\n    # Tillie;\n    # and they lived at the bottom of a well.\n    #\n    # ...\n\nDoes this look like what you need? If so, read on.\n\nInstalling Beautiful Soup\n=========================\n\nIf you're using a recent version of Debian or Ubuntu Linux, you can\ninstall Beautiful Soup with the system package manager:\n\n    $ apt-get install python-bs4`\n\nBeautiful Soup 4 is published through PyPi, so if you can't install it\nwith the system packager, you can install it with ``easy_install`` or\n``pip``. The package name is ``beautifulsoup4``, and the same package\nworks on Python 2 and Python 3.\n\n    $ easy_install beautifulsoup4`\n\n    $ pip install beautifulsoup4`\n\n(The ``BeautifulSoup`` package is probably `not` what you want. That's\nthe previous major release, `Beautiful Soup 3`_. Lots of software uses\nBS3, so it's still available, but if you're writing new code you\nshould install ``beautifulsoup4``.)\n\nIf you don't have ``easy_install`` or ``pip`` installed, you can\ndownload the Beautiful Soup 4 source tarball\n\u003chttp://www.crummy.com/software/BeautifulSoup/download/4.x/\u003e and\ninstall it with ``setup.py``.\n\n    $ python setup.py install`\n\nIf all else fails, the license for Beautiful Soup allows you to\npackage the entire library with your application. You can download the\ntarball, copy its ``bs4`` directory into your application's codebase,\nand use Beautiful Soup without installing it at all.\n\nI use Python 2.7 and Python 3.2 to develop Beautiful Soup, but it\nshould work with other recent versions.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwention%2FBeautifulSoup4","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwention%2FBeautifulSoup4","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwention%2FBeautifulSoup4/lists"}