An open API service indexing awesome lists of open source software.

https://github.com/akashrajpurohit/html2json

Scrapes a live website and convert it to JSON format
https://github.com/akashrajpurohit/html2json

html2json

Last synced: 6 months ago
JSON representation

Scrapes a live website and convert it to JSON format

Awesome Lists containing this project

README

          

# html2json converts a website into its equivalent JSON format

Sample input HTML file server at ```http://localhost:4000```

Website:
```html


Hello World





Hello


World


Lorem ipsum dolor sit amet, consectetur adipisicing elit. Nulla, laudantium, omnis. Ea quaerat minima, nostrum doloremque repellendus! Ratione quasi, non eligendi quidem at culpa animi vitae id eius corrupti deleniti.
Some image




This is some more dummy text


Lorem ipsum dolor sit amet, consectetur adipisicing elit. Quaerat vitae dolor, atque, excepturi numquam cumque ut iusto, odio perferendis cum rem saepe eveniet voluptatum fuga debitis et illo distinctio eligendi!




Hi there, this is empty div with no children :(


Different section


```

Output:
```
{
"tag": "html",
"attributes": {
"lang": "en"
},
"child": [
{
"tag": "head",
"attributes": {},
"child": [
{
"tag": "meta",
"attributes": {
"charset": "UTF-8"
},
"content": null,
"child": []
},
{
"tag": "title",
"attributes": {},
"content": "Hello World",
"child": []
}
]
},
{
"tag": "body",
"attributes": {
"class": "body-color"
},
"child": [
{
"tag": "nav",
"attributes": {},
"child": [
{
"tag": "ul",
"attributes": {},
"child": [
{
"tag": "li",
"attributes": {},
"child": [
{
"tag": "a",
"attributes": {
"href": "/index.html"
},
"content": "Home",
"child": []
}
]
},
{
"tag": "li",
"attributes": {},
"child": [
{
"tag": "a",
"attributes": {
"href": "/about.html"
},
"content": "About",
"child": []
}
]
},
{
"tag": "li",
"attributes": {},
"child": [
{
"tag": "a",
"attributes": {
"href": "/contact.html"
},
"content": "Contact",
"child": []
}
]
},
{
"tag": "li",
"attributes": {},
"child": [
{
"tag": "a",
"attributes": {
"href": "/blog.html"
},
"content": "Blogs",
"child": []
}
]
}
]
}
]
},
{
"tag": "section",
"attributes": {
"class": "main"
},
"child": [
{
"tag": "h1",
"attributes": {
"class": "red full-width"
},
"content": "Hello",
"child": []
},
{
"tag": "h3",
"attributes": {
"class": "blue full-width"
},
"content": "World",
"child": []
},
{
"tag": "p",
"attributes": {},
"child": [
{
"tag": "img",
"attributes": {
"src": "https://fakeimg.pl/300/",
"alt": "Some image"
},
"content": null,
"child": []
}
]
}
]
},
{
"tag": "main",
"attributes": {
"class": "container"
},
"child": [
{
"tag": "p",
"attributes": {},
"content": "This is some more dummy text",
"child": []
},
{
"tag": "h4",
"attributes": {},
"content": "Lorem ipsum dolor sit amet, consectetur adipisicing elit. Quaerat vitae dolor, atque, excepturi numquam cumque ut iusto, odio perferendis cum rem saepe eveniet voluptatum fuga debitis et illo distinctio eligendi!",
"child": []
}
]
},
{
"tag": "div",
"attributes": {
"class": "div_content"
},
"content": null,
"child": []
},
{
"tag": "section",
"attributes": {
"class": "different"
},
"child": [
{
"tag": "p",
"attributes": {
"id": "p-id",
"data-attr": "custom-attribute"
},
"content": "Different section",
"child": []
}
]
}
]
}
]
}
```

## Output is stored in a file in ```outputs``` directory