https://github.com/fhightower/html-to-json
Convert HTML to JSON. Can also (intelligently) convert HTML tables to JSON (using table headers (if available) as keys in the resulting JSON).
https://github.com/fhightower/html-to-json
hacktoberfest html html-converter html-tables html-tables-to-json html-to-json html2json json
Last synced: 22 days ago
JSON representation
Convert HTML to JSON. Can also (intelligently) convert HTML tables to JSON (using table headers (if available) as keys in the resulting JSON).
- Host: GitHub
- URL: https://github.com/fhightower/html-to-json
- Owner: fhightower
- License: mit
- Created: 2021-01-22T21:37:53.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2023-06-06T12:33:46.000Z (almost 2 years ago)
- Last Synced: 2025-04-13T01:08:48.432Z (22 days ago)
- Topics: hacktoberfest, html, html-converter, html-tables, html-tables-to-json, html-to-json, html2json, json
- Language: HTML
- Homepage:
- Size: 573 KB
- Stars: 50
- Watchers: 2
- Forks: 8
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# HTML to JSON
[](https://pypi.python.org/pypi/html-to-json)
[](https://codecov.io/gh/fhightower/html-to-json)Convert HTML and/or HTML tables to JSON.
## Current Status
📢 I have a lot of demands on my time at the moment and won't be able to work on this library without [sponsorship](https://github.com/sponsors/fhightower). If this library is useful to you or if you're using this library for a business - please consider [sponsoring](https://github.com/sponsors/fhightower) me. Even a small sponsorship allows me to prioritize work on this library and ongoing maintainance. Thanks!
## Installation
```
pip install html-to-json
```## Usage
### HTML to JSON
```python
import html_to_jsonhtml_string = """
Test site
"""
output_json = html_to_json.convert(html_string)
print(output_json)
```When calling the `html_to_json.convert` function, you can choose to not capture the text values from the html by passing in the key-word argument `capture_element_values=False`. You can also choose to not capture the attributes of the elements by passing `capture_element_attributes=False` into the function.
#### Example
Example input:
```html
Floyd Hightower's Projects
```
Example output:
```json
{
"head": [
{
"title": [
{
"_value": "Floyd Hightower's Projects"
}],
"meta": [
{
"_attributes":
{
"charset": "UTF-8"
}
},
{
"_attributes":
{
"name": "description",
"content": "Floyd Hightower's Projects"
}
},
{
"_attributes":
{
"name": "keywords",
"content": "projects,fhightower,Floyd,Hightower"
}
}]
}]
}
```### HTML Tables to JSON
In addition to converting HTML to JSON, this library can also intelligently convert HTML tables to JSON.
Currently, this library can handle three types of tables:
A. Those with [table headers](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/th) in the first row
B. Those with table headers in the first column
C. Those without table headersTables of type A and B are diagrammed below:

#### Example
This code:
```python
import html_to_jsonhtml_string = """
#
Malware
MD5
Date Added
25548
DarkComet
034a37b2a2307f876adc9538986d7b86
July 9, 2018, 6:25 a.m.
25547
DarkComet
706eeefbac3de4d58b27d964173999c3
July 7, 2018, 6:25 a.m.
"""
tables = html_to_json.convert_tables(html_string)
print(tables)
```will produce this output:
```json
[
[
{
"#": "25548",
"Malware": "DarkComet",
"MD5": "034a37b2a2307f876adc9538986d7b86",
"Date Added": "July 9, 2018, 6:25 a.m."
}, {
"#": "25547",
"Malware": "DarkComet",
"MD5": "706eeefbac3de4d58b27d964173999c3",
"Date Added": "July 7, 2018, 6:25 a.m."
}
]
]
```## Credits
This package was created with [Cookiecutter](https://github.com/audreyr/cookiecutter) and fhightower's [Python project template](https://github.com/fhightower-templates/python-project-template).