Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rbren/scrape-to-swagger
Scrape API Documentation from HTML to a JSON Swagger Document
https://github.com/rbren/scrape-to-swagger
Last synced: 14 days ago
JSON representation
Scrape API Documentation from HTML to a JSON Swagger Document
- Host: GitHub
- URL: https://github.com/rbren/scrape-to-swagger
- Owner: rbren
- Created: 2015-12-17T15:22:57.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2019-03-12T11:39:29.000Z (over 5 years ago)
- Last Synced: 2024-08-09T13:16:48.032Z (3 months ago)
- Language: JavaScript
- Homepage:
- Size: 838 KB
- Stars: 28
- Watchers: 4
- Forks: 9
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# scrape-to-swagger
## Usage
See the [config directory](config) for example configurations```bash
npm install -g scrape-to-swagger
scrape-to-swagger --config ./config/product-hunt.js --output ./output/product-hunt.swagger.json --verbose
```## Config
The configuration file tells scrape-to-swagger how to retrieve each piece of the swagger document from the HTML.
#### Required
* `url` - The root page to scrape
* `depth` - The depth to crawl links#### Selectors
Selector fields correspond to parts of the Swagger document. These can be specified as constants or as selectors.
* `host`
* `basePath`
* `title`
* `description`
* `operations` - A group of operations
* `operation` - A single operation (e.g. GET /users)
* `path`
* `method`
* `operationDescription`
* `parameters`
* `parameter`
* `parameterIn`
* `parameterName`
* `parameterRequired`
* `parameterDescription`
* `responses`
* `response`
* `responseStatus`
* `responseDescription`
* `responseSchema`Selectors are passed to [cheerio](https://github.com/cheeriojs/cheerio), which is based on jQuery.
The selector object takes four parameters:
* selector - a CSS selector. This will be applied as parent.find(selector), where the parent is the element in the hierarchy above (or <body> if none)
* regex - A regular expression for extracting the field from the text inside the selector
* regexIndex - The index of the matched items to use (default=1, i.e. it will capture the first thing you put in parens)
* join - If true, join *all* the elements matched by the selector
* split - If true, include all siblings of the element matched by the selector, up until the next match of the selector#### Other
* `fixPathParameters` a function for converting to Swagger-style parameters, e.g. from /users/:username to /users/{username}
```js
var config = module.exports = {
url: 'https://www.quandl.com/docs/api',
depth: 0,
protocols: ['https'],
host: 'www.quandl.com',
basePath: '/api/v3',
title: 'Quandl API',
description: 'The Quandl API',operations: {selector: 'h2', split: true},
operationDescription: {selector: 'h2 ~ p', join: true},
path: {selector: 'pre:contains(Definition) + div + blockquote', regex: /\w+ https:\/\/www.quandl.com\/api\/v3(\/\S*)/},
method: {selector: 'pre:contains(Definition) + div + blockquote', regex: /(\w+) https:\/\/www.quandl.com\/api\/v3\/\S*/},parameters: {selector: 'table'},
parameter: {selector: 'tr'},
parameterIn: 'query',
parameterType: 'string',
parameterName: {selector: 'td:first-of-type', regex: /(\S+)/},
parameterRequired: {selector: 'td:nth-of-type(2)'},
parameterDescription: {selector: 'td:nth-of-type(3)'},responseSchema: {selector: 'pre:contains(Example Response) + div + blockquote', isExample: true},
fixPathParameters: function(path) {
pieces = path.split('/');
pieces = pieces.map(function(p) {
return p.replace(/:(\w+)/g, '{$1}')
})
return pieces.join('/');
}
}config.securityDefinitions = {
'apiKey': {
type: 'apiKey',
name: 'api_key',
in: 'query',
}
}
```