https://github.com/wention/BeautifulSoup4

git mirror for Beautiful Soup 4.3.2
https://github.com/wention/BeautifulSoup4

Last synced: 5 months ago
JSON representation

git mirror for Beautiful Soup 4.3.2

Host: GitHub
URL: https://github.com/wention/BeautifulSoup4
Owner: wention
License: other
Archived: true
Created: 2015-03-28T12:58:24.000Z (about 11 years ago)
Default Branch: master
Last Pushed: 2022-11-08T14:44:40.000Z (over 3 years ago)
Last Synced: 2026-01-19T17:35:21.307Z (5 months ago)
Language: Python
Size: 238 KB
Stars: 204
Watchers: 4
Forks: 59
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: COPYING.txt

Awesome Lists containing this project

awesome-python-fa - **BeautifulSoup** - یک کتابخانه ساده برای پردازش HTML و XML. BeautifulSoup به شما کمک می‌کند تا داده‌های مورد نظر خود را از صفحات وب استخراج کنید. این کتابخانه یکی از محبوب‌ترین ابزارها در وب اسکرپینگ است. (📚 فهرست / وب اسکرپینگ)

README

          Beautiful Soup Documentation

============================

[Beautiful Soup](http://www.crummy.com/software/BeautifulSoup/) is a

Python library for pulling data out of HTML and XML files. It works

with your favorite parser to provide idiomatic ways of navigating,

searching, and modifying the parse tree. It commonly saves programmers

hours or days of work.

Quick Start

===========

Here's an HTML document I'll be using as an example throughout this

document. It's part of a story from `Alice in Wonderland`::

    html_doc = """

    The Dormouse's story

    

    
The Dormouse's story


    
Once upon a time there were three little sisters; and their names were

    Elsie,

    Lacie and

    Tillie;

    and they lived at the bottom of a well.


    
...

    """

Running the "three sisters" document through Beautiful Soup gives us a

``BeautifulSoup`` object, which represents the document as a nested

data structure::

    from bs4 import BeautifulSoup

    soup = BeautifulSoup(html_doc)

    print(soup.prettify())

    # 

    #  

    #   

    #    The Dormouse's story

    #   

    #  

    #  

    #   


    #    

    #     The Dormouse's story

    #    

    #   

    #   

    #    Once upon a time there were three little sisters; and their names were

    #    

    #     Elsie

    #    

    #    ,

    #    

    #     Lacie

    #    

    #    and

    #    

    #     Tillie

    #    

    #    ; and they lived at the bottom of a well.

    #   

    #   

    #    ...

    #   

    #  

    # 

Here are some simple ways to navigate that data structure::

    soup.title

    # The Dormouse's story

    soup.title.name

    # u'title'

    soup.title.string

    # u'The Dormouse's story'

    soup.title.parent.name

    # u'head'

    soup.p

    # 
The Dormouse's story


    soup.p['class']

    # u'title'

    soup.a

    # Elsie

    soup.find_all('a')

    # [Elsie,

    #  Lacie,

    #  Tillie]

    soup.find(id="link3")

    # Tillie

One common task is extracting all the URLs found within a page's  tags::


    for link in soup.find_all('a'):

        print(link.get('href'))

    # http://example.com/elsie

    # http://example.com/lacie

    # http://example.com/tillie

Another common task is extracting all the text from a page::

    print(soup.get_text())

    # The Dormouse's story

    #

    # The Dormouse's story

    #

    # Once upon a time there were three little sisters; and their names were

    # Elsie,

    # Lacie and

    # Tillie;

    # and they lived at the bottom of a well.

    #

    # ...

Does this look like what you need? If so, read on.

Installing Beautiful Soup

=========================

If you're using a recent version of Debian or Ubuntu Linux, you can

install Beautiful Soup with the system package manager:

    $ apt-get install python-bs4`

Beautiful Soup 4 is published through PyPi, so if you can't install it

with the system packager, you can install it with ``easy_install`` or

``pip``. The package name is ``beautifulsoup4``, and the same package

works on Python 2 and Python 3.

    $ easy_install beautifulsoup4`

    $ pip install beautifulsoup4`

(The ``BeautifulSoup`` package is probably `not` what you want. That's

the previous major release, `Beautiful Soup 3`_. Lots of software uses

BS3, so it's still available, but if you're writing new code you

should install ``beautifulsoup4``.)

If you don't have ``easy_install`` or ``pip`` installed, you can

download the Beautiful Soup 4 source tarball

 and

install it with ``setup.py``.

    $ python setup.py install`

If all else fails, the license for Beautiful Soup allows you to

package the entire library with your application. You can download the

tarball, copy its ``bs4`` directory into your application's codebase,

and use Beautiful Soup without installing it at all.

I use Python 2.7 and Python 3.2 to develop Beautiful Soup, but it

should work with other recent versions.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wention/BeautifulSoup4

Awesome Lists containing this project

README