Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/EvitanRelta/htmlarkdown

HTML-to-Markdown converter that adaptively preserves HTML when needed (eg. when center-aligning, or resizing images)
https://github.com/EvitanRelta/htmlarkdown

commonmark converter gfm html-converter html-to-markdown javascript node node-js nodejs typescript

Last synced: about 2 months ago
JSON representation

HTML-to-Markdown converter that adaptively preserves HTML when needed (eg. when center-aligning, or resizing images)

Awesome Lists containing this project

README

        





HTMLarkdown Title

[![Coverage][coverage-badge]][coverage-link]
[![Version][version-badge]][version-link]
[![License][license-badge]][license-link]

[coverage-badge]: https://badgen.net/codecov/c/github/EvitanRelta/htmlarkdown?color=009900
[coverage-link]: https://codecov.io/gh/EvitanRelta/htmlarkdown
[version-badge]: https://badgen.net/github/release/EvitanRelta/htmlarkdown
[version-link]: https://github.com/EvitanRelta/htmlarkdown/releases/latest
[license-badge]: https://badgen.net/github/license/EvitanRelta/htmlarkdown
[license-link]: https://badgen.net/github/license/EvitanRelta/htmlarkdown


HTMLarkdown is a **HTML-to-Markdown converter** that's able to output HTML-syntax when required.
Like when center-aligning, or resizing images:





Switching to HTML showcase

- Written completely in **TypeScript**.
- Has many Jest [tests](./tests), covering many edge-case conversions.
> _[Leave a issue/PR](#new-conversions-ideas-features-tests) if you can think of more!_
- [For now](#other-markdown-specs), is designed for [GFM].
- Try it out at the demo site below!
https://evitanrelta.github.io/htmlarkdown

[GFM]: https://github.github.com/gfm/


# How is this different?

## Switching to HTML-syntax

Whenever elements **cannot be represented** in markdown-syntax, HTMLarkdown will **switch to HTML-syntax**:



Input HTML
Output Markdown





<h1>Normal-heading is <strong>boring</strong></h1>

<h1 align="center">
Centered-heading is <strong>da wae</strong>
</h1>

<p><img src="https://image.src" /></p>

<p><img width="80%" src="https://image.src" /></p>




# Normal-heading is **boring**

<h1 align="center">
Centered-heading is <b>da wae</b>
</h1>

![](https://image.src)

<img width="80%" src="https://image.src" />




> _**Note:** The HTML-switching is controlled by the rules' `Rule.toUseHtmlPredicate`._


But HTMLarkdown tries to use as **little HTML-syntax** as possible. **Mixing markdown and HTML** if needed:



Input HTML
Output Markdown





<blockquote>

<p align="center">
Centered-paragraph
</p>
<p>Below is a horizontal-rule in blockquote:</p>
<hr>
</blockquote>



> <p align="center">

> Centered-paragraph
> </p>
> Below is a horizontal-rule in blockquote:
>
> <hr>




Depending on the situation, HTMLarkdown will switch between markdown's **backslash-escaping** or **HTML-escaping**:



Input HTML
Output Markdown





<!-- In markdown -->

<p>&lt;TAG&gt;, **NOT BOLD**</p>

<!-- In in-line HTML -->
<p>
<sup>&lt;TAG&gt;, **NOT BOLD**</sup>
</p>

<!-- In block HTML -->
<p align="center">
&lt;TAG&gt;, **NOT BOLD**
</p>




\<TAG>, \*\*NOT BOLD\*\*

<sup>\<TAG>, \*\*NOT BOLD\*\*</sup>

<p align="center">
&lt;TAG>, **NOT BOLD**
</p>





## Handling of edge cases

Adding separators in-between adjacent lists to prevent them from being combined by markdown-renderers:



Input HTML
Output Markdown





<ul>

<li>List 1 > item 1</li>
<li>List 1 > item 2</li>
</ul>
<ul>
<li>List 2 > item 1</li>
<li>List 2 > item 2</li>
</ul>



- List 1 > item 1

- List 1 > item 2

<!-- LIST_SEPARATOR -->

- List 2 > item 1
- List 2 > item 2




And more!
But this section is getting too long so...


# Installation

```bash
npm install htmlarkdown
```


# Usage

## Markdown conversion _(either from `Element` or `string`)_

```js
import { HTMLarkdown } from 'htmlarkdown'

/** Convert an element! */
const htmlarkdown = new HTMLarkdown()
const container = document.getElementById('container')
console.log(container.outerHTML)
// => '

Heading

'
htmlarkdown.convert(container)
// => '# Heading'

/**
* Or a HTML string!
* Whichever u prefer. It's 2022, I don't judge :^)
*/
const htmlString = `

Heading


Paragraph


`
const htmlStrWithContainer = `
${htmlString}
`
htmlarkdown.convert(htmlString)
// Set 2nd param 'hasContainer' to true, for container-wrapped string.
htmlarkdown.convert(htmlStrWithContainer, true)
// Both output => '# Heading\n\nParagraph'
```

>
> **Note:** If an element is given to `convert`, it's deep-cloned before any processing/conversion.
> Thus, you don't have to worry about it mutating the original element :)


## Configuring

```js
/** Configure when creating an instance. */
const htmlarkdown = new HTMLarkdown({
htmlEscapingMode: '&<>',
maxPrettyTableWidth: Number.POSITIVE_INFINITY,
addTrailingLinebreak: true
})

/** Or on an existing instance. */
htmlarkdown.options.maxPrettyTableWidth = -1
```


## Plugins

Plugins are of type `(htmlarkdown: HTMLarkdown): void`.
They take in a `HTMLarkdown` instance and configure it by **mutating** it.

There's 2 plugin-options available in the `options` object: `preloadPlugins` and `plugins`.
The difference is:
- `preloadPlugins` loads the plugins **first**, before your other options. _(likes "presets")_
Allowing you to overwrite the plugins' changes:
```ts
const enableTrailingLinebreak: Plugin = (htmlarkdown) => {
htmlarkdown.options.addTrailingLinebreak = true
}
const htmlarkdown = new HTMLarkdown({
addTrailingLinebreak: false,
preloadPlugins: [enableTrailingLinebreak],
})
htmlarkdown.options.preloadPlugins // false
```
- `plugins` loads the plugins **after** your other options.
Meaning, plugins can overwrite your options.
```ts
const enableTrailingLinebreak: Plugin = (htmlarkdown) => {
htmlarkdown.options.addTrailingLinebreak = true
}
const htmlarkdown = new HTMLarkdown({
addTrailingLinebreak: false,
plugins: [enableTrailingLinebreak],
})
htmlarkdown.options.preloadPlugins // true
```


You can also load plugins on existing instances:
```js
htmlarkdown.loadPlugins([myPlugin])
```


## Making a copy of an instance

The conversion of a `HTMLarkdown` instance **solely** depends on its `options` property.
Meaning, you create a copy of an instance like this:

```js
const htmlarkdown = new HTMLarkdown()
const copy = new HTMLarkdown(htmlarkdown.options)
```


## Configuring rules/processes

See [this section](#how-it-works) for info on what the rules/processes do.

```js
/**
* Overwriting default rules/processes.
* (does NOT include the defaults)
*/
const htmlarkdown = new HTMLarkdown({
preProcesses: [myPreProcess1, myPreProcess2],
rules: [myRule1, myRule2],
textProcesses: [myTextProcess1, myTextProcess2],
postProcesses: [myPostProcess1, myPostProcess2]
})

/**
* Adding on to default rules/processes.
* (includes the defaults)
*/
const htmlarkdown = new HTMLarkdown()
htmlarkdown.addPreProcess(myPreProcess)
htmlarkdown.addRule(myRule)
htmlarkdown.addTextProcess(myTextProcess)
htmlarkdown.addPostProcess(myPostProcess)
```


# How it works

HTMLarkdown has 3 distinct phases:

1. **Pre-processing**
The container-element that's received _(and [deep-cloned](#deep-clone))_ by the `convert` method is passed consecutively to each `PreProcess` in `options.preProcesses`.

2. **Conversion**
The pre-processed container-element is then recursively converted to markdown.
Elements are converted by `Rule` in `options.rules`.
Text-nodes are converted by `TextProcess` in `options.textProcesses`.
The rule/text-process outputs strings are then appended to each other, to give the raw markdown.

3. **Post-processing**
The raw markdown string is then passed consecutively to each `PostProcess` in `options.postProcess`, to give the final markdown.





Rule-processes flowchart


(image: the general conversion flow of HTMLarkdown)


# Contributing

## Bugs

HTMLarkdown is still under-development, so there'll likely be bugs.

So the easiest way to contribute is submit an issue _(with the `bug` label)_, especially for any incorrect markdown-conversions :)

For any incorrect markdown-conversions, state the:
- input HTML
- current incorrect markdown output
- expected markdown output


## New conversions, ideas, features, tests

If you have any new elements-conversions / ideas / features / tests that you think should be added, leave an issue with `feature` or `improve` label!

> - `feature` label is for new features
> - `improve` label is for improvements on existing features
>
> Understandably, there are gray areas on what is a "feature" and what is an "improvement". So just go with whichever seems more appropriate :)


## Other markdown specs
Currently, HTMLarkdown has been designed to output markdown for GitHub specifically _(ie. [GFM])_.
BUT, if there's another markdown spec. that you'd like to design for _(maybe as a plugin?)_, do leave an issue/discussion :D

[GFM]: https://github.github.com/gfm/


## Coding-related stuff

Code-formatting is handled by [Prettier], so no need to worry bout it :)

Any new feature should
- be documented via TSDoc
- come with new unit-tests for them
- and should pass all new/existing tests

As for which merging method to use, check out the [discussion][merging-discussion].

[merging-discussion]: https://github.com/EvitanRelta/htmlarkdown/discussions/41


[Prettier]: https://prettier.io/

# Contributors

So far it's just me, so pls send help! :^)


# Roadmap

If you've any new ideas / features, check out the [Contributing section for it](#new-conversions-ideas-features-tests)!


## Element conversions

### Block-elements:
- [x] Headings _([For now][setext-issue], only [ATX-style][atx])_
- [x] Paragraph
- [x] Codeblock
- [x] Blockquote
- [x] Lists
_(ordered, unordered, [tight][tight] and [loose][loose])_
- [x] _([GFM][gfm-table])_ Table
- [ ] _([GFM][gfm-task-list])_ Task-list


_(Below are some planned block-elements that don't have markdown-equivalent)_
- [x] `` _(handled by a [noop-rule](#noop-rule))_
- [x] `

` _([For now][div-noop-issue], handled by a [noop-rule](#noop-rule))_
- [ ] Definition list _(ie. `
`, `
`, `

`)_
- [ ] Collapsible section _(ie. ``)_

[setext-issue]: https://github.com/EvitanRelta/htmlarkdown/issues/36
[atx]: https://spec.commonmark.org/0.30/#atx-heading
[div-noop-issue]: https://github.com/EvitanRelta/htmlarkdown/issues/19
[tight]: https://github.github.com/gfm/#tight
[loose]: https://github.github.com/gfm/#loose
[gfm-table]: https://github.github.com/gfm/#tables-extension-
[gfm-task-list]: https://github.github.com/gfm/#task-list-items-extension-


### Text-formattings:
- [x] **Bold** _([For now][underscore-issue], only outputs in asterisks `**BOLD**`)_
- [x] _Italic_ _([For now][underscore-issue], only outputs in asterisks `*ITALIC*`)_
- [x] _([GFM][gfm-strikethrough])_ ~~Strikethrough~~
- [x] `Code`
- [x] [Link][secret] _([For now][ref-link-issue], only [inline links][inline-link])_
- [x] Superscript _(ie. ``)_
- [x] Subscript _(ie. ``)_
- [x] Underline _(ie. ``, ``)_
_(didn't know underlines possible till recently)_

[underscore-issue]: https://github.com/EvitanRelta/htmlarkdown/issues/39
[gfm-strikethrough]: https://github.github.com/gfm/#strikethrough-extension-
[secret]: https://www.youtube.com/watch?v=dQw4w9WgXcQ
[ref-link-issue]: https://github.com/EvitanRelta/htmlarkdown/issues/38
[inline-link]: https://spec.commonmark.org/0.30/#inline-link


Misc:
- [x] Images _([For now][ref-link-issue], only [inline links][inline-link])_
- [x] Horizontal-rule _(ie. `


`)_
- [x] Linebreaks _(ie. ``)_
- [ ] Preserved HTML comments _([Issue \#25][preserve-comment-issue])_
_(eg. ``)_

[preserve-comment-issue]: https://github.com/EvitanRelta/htmlarkdown/issues/25


Features to be added:
- Custom `id` attributes
```html
Go to [section with id](#my-section)


My section


```
- Reversing GitHub's Issue/PR autolinks



Input HTML
Output Markdown





<p>

Issue autolink:
<a href="https://github.com/user/repo/issues/7">#7</a>
</p>



Issue autolink: #7






- Ability to customise how codeblock's syntax-highlighting langauge is obtained from the `
` elements



noop-rule:

They only pass-on their converted inner-contents to their parents.

They themselves don't have any markdown conversions, not even in HTML-syntax.


# License

The MIT License (MIT).
So it's freeeeeee