Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/scalastic/aws-documentation-scraper
Scraper of AWS documentation website which runs through 2 levels and exports documentation into a JSON file. Distinct IDs are also generated on array data to deal with some HTML UI components that would parse the generated JSON data.
https://github.com/scalastic/aws-documentation-scraper
aws documentation html2json json scraper upickle
Last synced: about 1 month ago
JSON representation
Scraper of AWS documentation website which runs through 2 levels and exports documentation into a JSON file. Distinct IDs are also generated on array data to deal with some HTML UI components that would parse the generated JSON data.
- Host: GitHub
- URL: https://github.com/scalastic/aws-documentation-scraper
- Owner: scalastic
- License: mit
- Created: 2021-06-10T16:29:10.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-11-08T23:24:15.000Z (about 1 year ago)
- Last Synced: 2024-10-12T19:48:10.247Z (3 months ago)
- Topics: aws, documentation, html2json, json, scraper, upickle
- Language: Scala
- Homepage:
- Size: 328 KB
- Stars: 2
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AWS Documentation Scraper
![GitHub](https://img.shields.io/github/license/scalastic/aws-documentation-scraper)
![GitHub contributors](https://img.shields.io/github/contributors/scalastic/aws-documentation-scraper)
![code size](https://img.shields.io/github/languages/code-size/scalastic/aws-documentation-scraper)
![maintain status](https://img.shields.io/maintenance/yes/2021)Are you struggling to navigate the vast and intricate world of AWS documentation? AWS provides a wealth of services with esoteric names that can be daunting for newcomers. Fortunately, AWS's documentation website is XML encoded, which means we can extract data from each page and create our own comprehensive documentation in JSON format. This allows for easy publication on your website or application.
## Requirements
Before you begin, ensure you have met the following prerequisites:
- Internet Connection
- Scala 2.13 ()
- SBT ()## Usage
1. Launch the application by running the following command:
```
sbt run
```2. It will generate 3 files into `./data` folder:
* `root-documentation.ser` which is the serialized data from the root
documentation page of AWS website (actually ),
* `full-documentation.ser` that contains all the serialized AWS documentation
from the root page and all its associated pages,
* `full-documentation.json` which is the resulting file containing all the AWS
documentation in JSON format.### Note:
The two serialized files are used for caching data, so don't forget to remove them if you need fresh and up-to-date data from AWS.
### Expected resultYou will obtain a JSON file with a structure similar to the following:
```json
{
"title":"AWS Documentation",
"subtitle":"Guides and API References",
"abstract":"Find user guides, developer guides, API references, tutorials, and more.",
"panels":[
{
"services":{
"service":[
{
"prefix":"Amazon",
"name":"EC2",
"href":{
"title":"Amazon Elastic Compute Cloud Documentation",
"short-title":"Amazon EC2",
"abstract":"Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable computing capacity—literally, servers in Amazon's data centers—that you use to build and host your software systems.",
"sections":[
{
"tiles":{
"tile":[
{
"href":"https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/",
"abstract":"Use Amazon EC2 to configure, launch, and manage virtual servers in the AWS cloud.",
"more-links":"",
"title":"User Guide for Linux Instances",
"id":"main-panels0-services-service0-href-sections0-tiles-tile0",
"locale":"en_us",
"pdf":"https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-ug.pdf",
"kindle":"https://www.amazon.com/dp/B076452RSZ",
"github":"https://github.com/awsdocs/amazon-ec2-user-guide/tree/master/doc_source"
.../...
```- Unique IDs with `id` keys are generated on array items. These will be useful when parsing data and integrating with
web UI components.
## Example of renderingYou can render this JSON data on a Jekyll website using the client-side json2html library, along with jQuery and Bootstrap for querying and displaying the JSON as HTML components.
Here are the necessary imports for rendering:
```html```
Additionally, here is the dedicated json2html rendering script:
```javascript
$.getJSON( "{{ site.url }}{{ site.baseurl }}/assets/full-documentation.json", function( data ) {
json2html.component.add('main-header',
{'<>':'section','html':[
{'<>':'h1','text':'${title}','class':'text-muted'},
{'<>':'p','text':'${abstract}','class':'lead'},
{'[]':'panels','obj':function(){return(this.panels)}}
]});
json2html.component.add('panels',
{'<>':'section','html':[
{'<>':'header','html':[
{'<>':'div','class':'accordion accordion-flush','id':'accordion-panels'},
{'<>':'div','class':'accordion-item','html':[
{'<>':'h2','class':'accordion-header','id':'heading-${id}','html':[
{'<>':'button','text':'${title}','class':'accordion-button collapsed','type':'button','data-bs-toggle':'collapse','data-bs-target':'#collapse-${id}','aria-expande':'false','aria-controls':'collapse-${id}'}
]},
{'<>':'div','id':'collapse-${id}','class':'accordion-collapse collapse','aria-labelledby':'heading-${id}','data-bs-parent':'#accordion-panels','html':[
{'<>':'div','class':'accordion-body'},
{'<>':'div','class':'row g-2','html':[
{'[]':'service','obj':function(){return(this.services.service)}}
]}
]}
]}
]}
]});json2html.component.add('service',
{'<>':'div','class':'col-6','html':[
{'<>':'div','class':'p-3 border bg-light','html':[
{'<>':'section','html':[
{'<>':'header','html':[
{'<>':'div','class':'accordion accordion-flush','id':'accordion-service'},
{'<>':'div','class':'accordion-item','html':[
{'<>':'h3','class':'accordion-header','id':'heading-${id}','html':[
{'<>':'button','class':'accordion-button collapsed','type':'button','data-bs-toggle':'collapse','data-bs-target':'#collapse-${id}','aria-expande':'false','aria-controls':'collapse-${id}','html':[
{'<>':'span','class':'text-muted','html':'${prefix}'},
{'html':' ${name}'}
]},
]},
{'<>':'div','id':'collapse-${id}','class':'accordion-collapse collapse','aria-labelledby':'heading-${id}','data-bs-parent':'#accordion-service','html':[
{'<>':'div','class':'accordion-body','html':[
{'[]':'service-href','obj':function(){return(this.href)}}
]}
]}
]}
]}
]}
]}
]});json2html.component.add('service-href',
{'<>':'div','class':'card','html':[
{'<>':'div','class':'card-header','text':'${abstract}'},
{'[]':'sections','obj':function(){return(this.sections)}}
]}
);json2html.component.add('sections',
{'html':[
{'<>':'div','class':'card-body','html':[
{'<>':'h4','class':'card-title','html':'${title}'}
]},
{'[]':'tiles','obj':function(){return(this.tiles)}}
]}
);json2html.component.add('tiles',
{'[]':'tile','obj':function(){return(this.tile)}}
);json2html.component.add('tile',
{'<>':'div','class':'card text-dark bg-light mb-3','html':[
{'<>':'h5','class':'card-header','text':'${title}'},
{'<>':'div','class':'card-body','html':[
{'<>':'p','class':'card-text','html':'${abstract}'},
{'<>':'div','class':'d-flex flex-row mb-3 justify-content-evenly','html':[
{'[]':'amazon','obj':function(){return(this)}},
{'[]':'pdf','obj':function(){return(this)}},
{'[]':'github','obj':function(){return(this)}}
]}
]}
]}
);json2html.component.add('amazon',
{'<>':'div','class':function(){if(!!this.href) return("p-2"); else return("visually-hidden");},'html':[
{'<>':'a','rel':'noopener noreferrer nofollow','href':'${href}','data-bs-toggle':'tooltip','data-bs-placement':'top','title':'More on AWS website','html':[
{'<>':'span','html':[
{'<>':'svg','width':'22.5','height':'18','xmlns':'http://www.w3.org/2000/svg','viewBox':'0 0 640 512','html':[
{'<>':'path', 'd':'M180.41 203.01c-.72 22.65 10.6 32.68 10.88 39.05a8.164 8.164 0 0 1-4.1 6.27l-12.8 8.96a10.66 10.66 0 0 1-5.63 1.92c-.43-.02-8.19 1.83-20.48-25.61a78.608 78.608 0 0 1-62.61 29.45c-16.28.89-60.4-9.24-58.13-56.21-1.59-38.28 34.06-62.06 70.93-60.05 7.1.02 21.6.37 46.99 6.27v-15.62c2.69-26.46-14.7-46.99-44.81-43.91-2.4.01-19.4-.5-45.84 10.11-7.36 3.38-8.3 2.82-10.75 2.82-7.41 0-4.36-21.48-2.94-24.2 5.21-6.4 35.86-18.35 65.94-18.18a76.857 76.857 0 0 1 55.69 17.28 70.285 70.285 0 0 1 17.67 52.36l-.01 69.29zM93.99 235.4c32.43-.47 46.16-19.97 49.29-30.47 2.46-10.05 2.05-16.41 2.05-27.4-9.67-2.32-23.59-4.85-39.56-4.87-15.15-1.14-42.82 5.63-41.74 32.26-1.24 16.79 11.12 31.4 29.96 30.48zm170.92 23.05c-7.86.72-11.52-4.86-12.68-10.37l-49.8-164.65c-.97-2.78-1.61-5.65-1.92-8.58a4.61 4.61 0 0 1 3.86-5.25c.24-.04-2.13 0 22.25 0 8.78-.88 11.64 6.03 12.55 10.37l35.72 140.83 33.16-140.83c.53-3.22 2.94-11.07 12.8-10.24h17.16c2.17-.18 11.11-.5 12.68 10.37l33.42 142.63L420.98 80.1c.48-2.18 2.72-11.37 12.68-10.37h19.72c.85-.13 6.15-.81 5.25 8.58-.43 1.85 3.41-10.66-52.75 169.9-1.15 5.51-4.82 11.09-12.68 10.37h-18.69c-10.94 1.15-12.51-9.66-12.68-10.75L328.67 110.7l-32.78 136.99c-.16 1.09-1.73 11.9-12.68 10.75h-18.3zm273.48 5.63c-5.88.01-33.92-.3-57.36-12.29a12.802 12.802 0 0 1-7.81-11.91v-10.75c0-8.45 6.2-6.9 8.83-5.89 10.04 4.06 16.48 7.14 28.81 9.6 36.65 7.53 52.77-2.3 56.72-4.48 13.15-7.81 14.19-25.68 5.25-34.95-10.48-8.79-15.48-9.12-53.13-21-4.64-1.29-43.7-13.61-43.79-52.36-.61-28.24 25.05-56.18 69.52-55.95 12.67-.01 46.43 4.13 55.57 15.62 1.35 2.09 2.02 4.55 1.92 7.04v10.11c0 4.44-1.62 6.66-4.87 6.66-7.71-.86-21.39-11.17-49.16-10.75-6.89-.36-39.89.91-38.41 24.97-.43 18.96 26.61 26.07 29.7 26.89 36.46 10.97 48.65 12.79 63.12 29.58 17.14 22.25 7.9 48.3 4.35 55.44-19.08 37.49-68.42 34.44-69.26 34.42zm40.2 104.86c-70.03 51.72-171.69 79.25-258.49 79.25A469.127 469.127 0 0 1 2.83 327.46c-6.53-5.89-.77-13.96 7.17-9.47a637.37 637.37 0 0 0 316.88 84.12 630.22 630.22 0 0 0 241.59-49.55c11.78-5 21.77 7.8 10.12 16.38zm29.19-33.29c-8.96-11.52-59.28-5.38-81.81-2.69-6.79.77-7.94-5.12-1.79-9.47 40.07-28.17 105.88-20.1 113.44-10.63 7.55 9.47-2.05 75.41-39.56 106.91-5.76 4.87-11.27 2.3-8.71-4.1 8.44-21.25 27.39-68.49 18.43-80.02z'}
]}
]}
]}
]});json2html.component.add('pdf',
{'<>':'div','class':function(){if(!!this.pdf) return("p-2"); else return("visually-hidden");},'html':[
{'<>':'a','rel':'noopener noreferrer nofollow','href':'${pdf}','data-bs-toggle':'tooltip','data-bs-placement':'top','title':'Download PDF','html':[
{'<>':'span','html':[
{'<>':'svg','width':'13.5','height':'18','xmlns':'http://www.w3.org/2000/svg','viewBox':'0 0 384 512','html':[
{'<>':'path','d':'M369.9 97.9L286 14C277 5 264.8-.1 252.1-.1H48C21.5 0 0 21.5 0 48v416c0 26.5 21.5 48 48 48h288c26.5 0 48-21.5 48-48V131.9c0-12.7-5.1-25-14.1-34zM332.1 128H256V51.9l76.1 76.1zM48 464V48h160v104c0 13.3 10.7 24 24 24h104v288H48zm250.2-143.7c-12.2-12-47-8.7-64.4-6.5-17.2-10.5-28.7-25-36.8-46.3 3.9-16.1 10.1-40.6 5.4-56-4.2-26.2-37.8-23.6-42.6-5.9-4.4 16.1-.4 38.5 7 67.1-10 23.9-24.9 56-35.4 74.4-20 10.3-47 26.2-51 46.2-3.3 15.8 26 55.2 76.1-31.2 22.4-7.4 46.8-16.5 68.4-20.1 18.9 10.2 41 17 55.8 17 25.5 0 28-28.2 17.5-38.7zm-198.1 77.8c5.1-13.7 24.5-29.5 30.4-35-19 30.3-30.4 35.7-30.4 35zm81.6-190.6c7.4 0 6.7 32.1 1.8 40.8-4.4-13.9-4.3-40.8-1.8-40.8zm-24.4 136.6c9.7-16.9 18-37 24.7-54.7 8.3 15.1 18.9 27.2 30.1 35.5-20.8 4.3-38.9 13.1-54.8 19.2zm131.6-5s-5 6-37.3-7.8c35.1-2.6 40.9 5.4 37.3 7.8z'}
]}
]}
]}
]});json2html.component.add('github',
{'<>':'div','class':function(){if(!!this.github) return("p-2"); else return("visually-hidden");},'html':[
{'<>':'a','rel':'noopener noreferrer nofollow','href':'${github}','data-bs-toggle':'tooltip','data-bs-placement':'top','title':'See source code on Github','html':[
{'<>':'span','html':[
{'<>':'svg','width':'17.5','height':'18','xmlns':'http://www.w3.org/2000/svg','viewBox':'0 0 496 512','html':[
{'<>':'path','d':'M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z'}
]}
]}
]}
]});
let template = [
{'[]':'main-header'}];
$('.container-fluid').json2html(data,template);
var tooltipTriggerList = [].slice.call(document.querySelectorAll('[data-bs-toggle="tooltip"]'));
var tooltipList = tooltipTriggerList.map(function (tooltipTriggerEl) {
return new bootstrap.Tooltip(tooltipTriggerEl)
})
});
```You can view the resulting page here: [https://scalastic.io/en/aws-documentation/](https://scalastic.io/en/aws-documentation/).
## Contributing to aws-documentation-scraper
There may be cases where the tool doesn't scrape all the content, such as when encapsulated XML is incomplete, or when some content relies on pure HTML that the tool cannot scrape. If you want to contribute to aws-documentation-scraper, follow these steps:
1. Fork this repository.
2. Create a branch with a clear name: git checkout -b .
3. Make your changes and commit them: git commit -m ''
4. Push to the original branch: git push origin /
5. Create a pull request.For more information, see GitHub's documentation on [creating a pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request).
## License
This project is licensed under the [MIT License](https://github.com/scalastic/aws-documentation-scraper/blob/main/LICENSE).