https://github.com/akashrajpurohit/html2json
Scrapes a live website and convert it to JSON format
https://github.com/akashrajpurohit/html2json
html2json
Last synced: 6 months ago
JSON representation
Scrapes a live website and convert it to JSON format
- Host: GitHub
- URL: https://github.com/akashrajpurohit/html2json
- Owner: AkashRajpurohit
- Created: 2019-05-24T04:54:50.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2023-12-05T05:15:22.000Z (about 2 years ago)
- Last Synced: 2025-06-14T15:49:12.019Z (9 months ago)
- Topics: html2json
- Language: JavaScript
- Size: 51.8 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# html2json converts a website into its equivalent JSON format
Sample input HTML file server at ```http://localhost:4000```
Website:
```html
Hello World
Hello
World
Lorem ipsum dolor sit amet, consectetur adipisicing elit. Nulla, laudantium, omnis. Ea quaerat minima, nostrum doloremque repellendus! Ratione quasi, non eligendi quidem at culpa animi vitae id eius corrupti deleniti.
This is some more dummy text
Lorem ipsum dolor sit amet, consectetur adipisicing elit. Quaerat vitae dolor, atque, excepturi numquam cumque ut iusto, odio perferendis cum rem saepe eveniet voluptatum fuga debitis et illo distinctio eligendi!
Hi there, this is empty div with no children :(
Different section
```
Output:
```
{
"tag": "html",
"attributes": {
"lang": "en"
},
"child": [
{
"tag": "head",
"attributes": {},
"child": [
{
"tag": "meta",
"attributes": {
"charset": "UTF-8"
},
"content": null,
"child": []
},
{
"tag": "title",
"attributes": {},
"content": "Hello World",
"child": []
}
]
},
{
"tag": "body",
"attributes": {
"class": "body-color"
},
"child": [
{
"tag": "nav",
"attributes": {},
"child": [
{
"tag": "ul",
"attributes": {},
"child": [
{
"tag": "li",
"attributes": {},
"child": [
{
"tag": "a",
"attributes": {
"href": "/index.html"
},
"content": "Home",
"child": []
}
]
},
{
"tag": "li",
"attributes": {},
"child": [
{
"tag": "a",
"attributes": {
"href": "/about.html"
},
"content": "About",
"child": []
}
]
},
{
"tag": "li",
"attributes": {},
"child": [
{
"tag": "a",
"attributes": {
"href": "/contact.html"
},
"content": "Contact",
"child": []
}
]
},
{
"tag": "li",
"attributes": {},
"child": [
{
"tag": "a",
"attributes": {
"href": "/blog.html"
},
"content": "Blogs",
"child": []
}
]
}
]
}
]
},
{
"tag": "section",
"attributes": {
"class": "main"
},
"child": [
{
"tag": "h1",
"attributes": {
"class": "red full-width"
},
"content": "Hello",
"child": []
},
{
"tag": "h3",
"attributes": {
"class": "blue full-width"
},
"content": "World",
"child": []
},
{
"tag": "p",
"attributes": {},
"child": [
{
"tag": "img",
"attributes": {
"src": "https://fakeimg.pl/300/",
"alt": "Some image"
},
"content": null,
"child": []
}
]
}
]
},
{
"tag": "main",
"attributes": {
"class": "container"
},
"child": [
{
"tag": "p",
"attributes": {},
"content": "This is some more dummy text",
"child": []
},
{
"tag": "h4",
"attributes": {},
"content": "Lorem ipsum dolor sit amet, consectetur adipisicing elit. Quaerat vitae dolor, atque, excepturi numquam cumque ut iusto, odio perferendis cum rem saepe eveniet voluptatum fuga debitis et illo distinctio eligendi!",
"child": []
}
]
},
{
"tag": "div",
"attributes": {
"class": "div_content"
},
"content": null,
"child": []
},
{
"tag": "section",
"attributes": {
"class": "different"
},
"child": [
{
"tag": "p",
"attributes": {
"id": "p-id",
"data-attr": "custom-attribute"
},
"content": "Different section",
"child": []
}
]
}
]
}
]
}
```
## Output is stored in a file in ```outputs``` directory