Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nicojs/html2commonmark
Converts HTML to markdown using commonmark.js. Compliant with the commmonmark markdown specification.
https://github.com/nicojs/html2commonmark
Last synced: 29 days ago
JSON representation
Converts HTML to markdown using commonmark.js. Compliant with the commmonmark markdown specification.
- Host: GitHub
- URL: https://github.com/nicojs/html2commonmark
- Owner: nicojs
- Created: 2015-11-30T13:44:33.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2016-02-23T14:47:06.000Z (over 8 years ago)
- Last Synced: 2024-09-14T21:20:35.205Z (about 2 months ago)
- Language: TypeScript
- Homepage: http://nicojs.github.io/html2commonmark/
- Size: 351 KB
- Stars: 18
- Watchers: 7
- Forks: 4
- Open Issues: 3
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
[![Build Status](https://travis-ci.org/nicojs/html2commonmark.svg)](https://travis-ci.org/nicojs/html2commonmark)
html2commonmark
===============
CommonMark is a rationalized version of Markdown syntax,
with a [spec][the spec] and BSD-licensed reference
implementations in C and JavaScript. The problem[the spec]: http://spec.commonmark.org
For more information, see .
This repository contains a JavaScript implementation for converting
html back to markdown using the same specification. It uses the same
Abstract Syntax Tree (AST) as commonmark.js does (and thus has a runtime dependency on it).
It even implements all its (600+) examples as mocha unit tests to verify the conversion against the spec.
Runs in the browser or on the server using nodejs.Live demo
-----------
See a live demo at https://nicojs.github.io/html2commonmarkInstalling
-----------
You can install the library using `npm`:npm install html2commonmark --save
This package includes a dependency on commonmark.js.
For client-side use, you can include the `node_modules/html2commonmark/dist/browser/bundle.js`
on your web page. It exposes the global `html2commonmark` variable.For server-side use, you can simply require it: `var html2commonmark = require('html2commonmark');`.
As this npm package is writen in typescript, you can also import the module if you're using `"moduleResolution": "node"`: `import * as html2commonmark from 'html2commonmark'`.
This will also work for the browser, but than you'll need a tool like [webpack](https://github.com/webpack/webpack) or [browserify](http://browserify.org/)
to package it (instead of using the bundle `node_modules/html2commonmark/dist/browser/bundle.js`)
However: you will also need the commonmark typings in order for this to work (using tsd: `tsd install commonmark`);Usage
-----Here's a basic example:
```javascript
var converter = new html2commonmark.BrowserConverter();
// From nodejs: var converter = new html2commonmark.JSDomConverter();
var renderer = new html2commonmark.Renderer();
var ast = converter.convert('This is awesome!
');
var markdown = renderer.render(ast); // "This *is* **awesome\!**"
```The `html2commonmark` object provides the following constructor functions:
* `html2commonmark.Converter`: can convert HTML DOM nodes to the AST nodes.
* _browser only_ `html2commonmark.BrowserConverter`: can convert HTML to AST nodes using the DOM parser of your browser
* _server only_ `html2commonmark.JSDomConverter`: can convert HTML to AST nodes using the [JSDom parser](https://www.npmjs.com/package/jsdom) of your browser
* `html2commonmark.Renderer`: can convert the AST nodes to markdown.The Converter's take an optional `option` parameter for configuring what to do with unknown html elements):
```javascript
new html2commonmark.BrowserConverter({ // this should be html2commonmark.JSDomConverter in NodeJS
rawHtmlElements: ['div', 'table', 'td', 'tr', 'th', 'tbody', 'thead'],
ignoredHtmlElements: ['custom-root', 'body'],
interpretUnknownHtml: true
});
```
The following options are supported:
* `rawHtmlElements`: A (case insensitive) whitelist of html elements which you want to interpret as raw html elements. Default: `['div', 'table', 'td', 'tr', 'th', 'tbody', 'thead']` _note: when interpretUnknwonHtml = true, all unknown html nodes will be preserved._
* `ignoreHtmlElements`: A (case insensitive) blacklist of html elements to ignore (not interpret as raw html elements). Default: `['custom-root', 'body']`. _note: when interpretUnknwonHtml = false: all unknown html elements will be ignored_
* `interpretUnknownHtml`: Describes what to do with unknown html elements. Default: `true`A more advanced example:
```javascript
// From nodejs: var converter = new html2commonmark.JSDomConverter();
var converter = new html2commonmark.BrowserConverter({interpretUnknownHtml: false});
var spanConverter = new html2commonmark.BrowserConverter({interpretUnknownHtml: false, rawHtmlElements: ['span']});var renderer = new html2commonmark.Renderer();
var input = 'a span of days';
var ast = converter.convert(input);
var spanAst = spanConverter.convert(input);
var markdown = renderer.render(ast); // "a span of days"
var markdownWithSpan = renderer.render(spanAst); // "a span of days"
```Limitations
----
The html2markdown uses an html parser. It uses the [JSDom parser](https://www.npmjs.com/package/jsdom) for nodejs and uses the parser of the browser for parsing on the client side. This means that some examples of the commonmark specifications are not implementedTake example 141:
* Html
```htmlFoo
baz
```
* Markdown
```markdown
Foo
baz
```
This html is a perfectly valid output of markdown (garbage in-garbage out). But because of our limitations of using the html parser JSDom, we cannot reproduce the exact same markdown.