Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/thomasborgen/soup2dict
Transforms BeautifulSoup soup to python dict or json
https://github.com/thomasborgen/soup2dict
beautifulsoup beautifulsoup4 dict json parser transformer
Last synced: 13 days ago
JSON representation
Transforms BeautifulSoup soup to python dict or json
- Host: GitHub
- URL: https://github.com/thomasborgen/soup2dict
- Owner: thomasborgen
- License: mit
- Created: 2020-11-09T21:40:54.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2021-12-26T07:50:11.000Z (almost 3 years ago)
- Last Synced: 2024-09-14T14:21:20.082Z (2 months ago)
- Topics: beautifulsoup, beautifulsoup4, dict, json, parser, transformer
- Language: Python
- Homepage:
- Size: 64.5 KB
- Stars: 6
- Watchers: 1
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# soup2dict
BeautifulSoup4 to python dictionary converter
___
![test](https://github.com/thomasborgen/soup2dict/workflows/test/badge.svg)
[![codecov](https://codecov.io/gh/thomasborgen/soup2dict/branch/master/graph/badge.svg)](https://codecov.io/gh/thomasborgen/soup2dict)
[![Python Version](https://img.shields.io/pypi/pyversions/soup2dict.svg)](https://pypi.org/project/soup2dict/)
[![wemake-python-styleguide](https://img.shields.io/badge/style-wemake-000000.svg)](https://github.com/wemake-services/wemake-python-styleguide)
___## Why
Its nice to have a convenient way to change your soup into dict.
## Installation
Get package with pip or poetry
```sh
pip install soup2dict
``````sh
poetry add soup2dict
```## Example
```python
import simplejson
from bs4 import BeautifulSoupfrom soup2dict import convert
html_doc = """
hei
The Dormouse's story
bob
The Dormouse's story
Once upon a time there were three little sisters;
and their names were
Elsie,
Lacie and
Tillie;
and they lived at the bottom of a well.
...
"""# Create soup from html_doc data
soup = BeautifulSoup(html_doc, 'html.parser')# Convert it to a dictionary with convert()
dict_result = convert(soup)with open('output.json', 'w') as output_file:
output_file.write(
simplejson.dumps(dict_result, indent=2),
)```
## Output
```json
{
"html": [
{
"#text": "hei The Dormouse's story bob The Dormouse's story Once upon a time there were three little sisters; and their names were Elsie , Lacie and Tillie ; and they lived at the bottom of a well. ...",
"navigablestring": [
"hei"
],
"head": [
{
"#text": "The Dormouse's story bob",
"title": [
{
"#text": "The Dormouse's story",
"navigablestring": [
"The Dormouse's story"
]
},
{
"#text": "bob",
"navigablestring": [
"bob"
]
}
]
}
],
"body": [
{
"#text": "The Dormouse's story Once upon a time there were three little sisters; and their names were Elsie , Lacie and Tillie ; and they lived at the bottom of a well. ...",
"p": [
{
"@class": [
"title"
],
"#text": "The Dormouse's story",
"navigablestring": [
"The"
],
"b": [
{
"#text": "Dormouse's story",
"navigablestring": [
"Dormouse's story"
]
}
]
},
{
"@class": [
"story"
],
"#text": "Once upon a time there were three little sisters; and their names were Elsie , Lacie and Tillie ; and they lived at the bottom of a well.",
"navigablestring": [
"Once upon a time there were three little sisters;\n and their names were",
",",
"and",
";\n and they lived at the bottom of a well."
],
"a": [
{
"@href": "http://example.com/elsie",
"@class": [
"sister"
],
"@id": "link1",
"#text": "Elsie",
"navigablestring": [
"Elsie"
]
},
{
"@href": "http://example.com/lacie",
"@class": [
"sister"
],
"@id": "link2",
"#text": "Lacie",
"navigablestring": [
"Lacie"
]
},
{
"@href": "http://example.com/tillie",
"@class": [
"sister"
],
"@id": "link3",
"#text": "Tillie",
"navigablestring": [
"Tillie"
]
}
]
},
{
"@class": [
"story"
],
"#text": "...",
"navigablestring": [
"..."
]
}
]
}
]
}
]
}
```