Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/nicojs/html2commonmark

Converts HTML to markdown using commonmark.js. Compliant with the commmonmark markdown specification.
https://github.com/nicojs/html2commonmark

Last synced: 29 days ago
JSON representation

Converts HTML to markdown using commonmark.js. Compliant with the commmonmark markdown specification.

Host: GitHub
URL: https://github.com/nicojs/html2commonmark
Owner: nicojs
Created: 2015-11-30T13:44:33.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2016-02-23T14:47:06.000Z (over 8 years ago)
Last Synced: 2024-09-14T21:20:35.205Z (about 2 months ago)
Language: TypeScript
Homepage: http://nicojs.github.io/html2commonmark/
Size: 351 KB
Stars: 18
Watchers: 7
Forks: 4
Open Issues: 3
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

        [![Build Status](https://travis-ci.org/nicojs/html2commonmark.svg)](https://travis-ci.org/nicojs/html2commonmark)

html2commonmark

===============

CommonMark is a rationalized version of Markdown syntax,

with a [spec][the spec] and BSD-licensed reference

implementations in C and JavaScript. The problem 

  [the spec]: http://spec.commonmark.org

  For more information, see .

This repository contains a JavaScript implementation for converting

html back to markdown using the same specification. It uses the same 

Abstract Syntax Tree (AST) as commonmark.js does (and thus has a runtime dependency on it).

It even implements all its (600+) examples as mocha unit tests to verify the conversion against the spec.

Runs in the browser or on the server using nodejs.

Live demo

-----------

See a live demo at https://nicojs.github.io/html2commonmark

Installing

-----------

You can install the library using `npm`:

    npm install html2commonmark --save

This package includes a dependency on commonmark.js.

For client-side use, you can include the `node_modules/html2commonmark/dist/browser/bundle.js`

on your web page. It exposes the global `html2commonmark` variable.

For server-side use, you can simply require it: `var html2commonmark = require('html2commonmark');`.

As this npm package is writen in typescript, you can also import the module if you're using `"moduleResolution": "node"`: `import * as html2commonmark from 'html2commonmark'`.

This will also work for the browser, but than you'll need a tool like [webpack](https://github.com/webpack/webpack) or [browserify](http://browserify.org/) 

to package it (instead of using the bundle `node_modules/html2commonmark/dist/browser/bundle.js`)

However: you will also need the commonmark typings in order for this to work (using tsd: `tsd install commonmark`);

Usage

-----

Here's a basic example:

```javascript

var converter = new html2commonmark.BrowserConverter();

// From nodejs: var converter = new html2commonmark.JSDomConverter();

var renderer = new html2commonmark.Renderer();

var ast = converter.convert('
This is awesome!');

var markdown = renderer.render(ast); // "This *is* **awesome\!**"

```

The `html2commonmark` object provides the following constructor functions:

* `html2commonmark.Converter`: can convert HTML DOM nodes to the AST nodes. 

    * _browser only_ `html2commonmark.BrowserConverter`: can convert HTML to AST nodes using the DOM parser of your browser

    * _server only_ `html2commonmark.JSDomConverter`: can convert HTML to AST nodes using the [JSDom parser](https://www.npmjs.com/package/jsdom) of your browser

* `html2commonmark.Renderer`: can convert the AST nodes to markdown.

The Converter's take an optional `option` parameter for configuring what to do with unknown html elements):

```javascript

new html2commonmark.BrowserConverter({ //  this should be html2commonmark.JSDomConverter in NodeJS

            rawHtmlElements: ['div', 'table', 'td', 'tr', 'th', 'tbody', 'thead'],

            ignoredHtmlElements: ['custom-root', 'body'],

            interpretUnknownHtml: true

        });

```

The following options are supported:

* `rawHtmlElements`: A (case insensitive) whitelist of html elements which you want to interpret as raw html elements. Default:  `['div', 'table', 'td', 'tr', 'th', 'tbody', 'thead']` _note: when interpretUnknwonHtml = true, all unknown html nodes will be preserved._

* `ignoreHtmlElements`: A (case insensitive) blacklist of html elements to ignore (not interpret as raw html elements). Default: `['custom-root', 'body']`. _note: when interpretUnknwonHtml = false: all unknown html elements will be ignored_

* `interpretUnknownHtml`: Describes what to do with unknown html elements. Default: `true`

A more advanced example:

```javascript

// From nodejs: var converter = new html2commonmark.JSDomConverter();

var converter = new html2commonmark.BrowserConverter({interpretUnknownHtml: false});

var spanConverter = new html2commonmark.BrowserConverter({interpretUnknownHtml: false, rawHtmlElements: ['span']});

var renderer = new html2commonmark.Renderer();

var input = 'a span of days';

var ast = converter.convert(input);

var spanAst = spanConverter.convert(input);

var markdown = renderer.render(ast); // "a span of days"

var markdownWithSpan = renderer.render(spanAst); // "a span of days"

```

Limitations

----

The html2markdown uses an html parser. It uses the [JSDom parser](https://www.npmjs.com/package/jsdom) for nodejs and uses the parser of the browser for parsing on the client side. This means that some examples of the commonmark specifications are not implemented

Take example 141:

* Html

```html

Foo



baz

```

* Markdown

```markdown

Foo



baz

```

This html is a perfectly valid output of markdown (garbage in-garbage out). But because of our limitations of using the html parser JSDom, we cannot reproduce the exact same markdown.