Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/EvitanRelta/htmlarkdown
HTML-to-Markdown converter that adaptively preserves HTML when needed (eg. when center-aligning, or resizing images)
https://github.com/EvitanRelta/htmlarkdown
commonmark converter gfm html-converter html-to-markdown javascript node node-js nodejs typescript
Last synced: 7 days ago
JSON representation
HTML-to-Markdown converter that adaptively preserves HTML when needed (eg. when center-aligning, or resizing images)
- Host: GitHub
- URL: https://github.com/EvitanRelta/htmlarkdown
- Owner: EvitanRelta
- License: mit
- Created: 2022-07-29T14:53:56.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2023-02-19T12:04:26.000Z (over 1 year ago)
- Last Synced: 2024-09-20T06:41:48.196Z (about 2 months ago)
- Topics: commonmark, converter, gfm, html-converter, html-to-markdown, javascript, node, node-js, nodejs, typescript
- Language: TypeScript
- Homepage: https://evitanrelta.github.io/htmlarkdown
- Size: 2.01 MB
- Stars: 57
- Watchers: 2
- Forks: 2
- Open Issues: 19
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[![Coverage][coverage-badge]][coverage-link]
[![Version][version-badge]][version-link]
[![License][license-badge]][license-link][coverage-badge]: https://badgen.net/codecov/c/github/EvitanRelta/htmlarkdown?color=009900
[coverage-link]: https://codecov.io/gh/EvitanRelta/htmlarkdown
[version-badge]: https://badgen.net/github/release/EvitanRelta/htmlarkdown
[version-link]: https://github.com/EvitanRelta/htmlarkdown/releases/latest
[license-badge]: https://badgen.net/github/license/EvitanRelta/htmlarkdown
[license-link]: https://badgen.net/github/license/EvitanRelta/htmlarkdown
HTMLarkdown is a **HTML-to-Markdown converter** that's able to output HTML-syntax when required.
Like when center-aligning, or resizing images:
- Written completely in **TypeScript**.
- Has many Jest [tests](./tests), covering many edge-case conversions.
> _[Leave a issue/PR](#new-conversions-ideas-features-tests) if you can think of more!_
- [For now](#other-markdown-specs), is designed for [GFM].
- Try it out at the demo site below!
https://evitanrelta.github.io/htmlarkdown[GFM]: https://github.github.com/gfm/
# How is this different?
## Switching to HTML-syntax
Whenever elements **cannot be represented** in markdown-syntax, HTMLarkdown will **switch to HTML-syntax**:
Input HTML
Output Markdown
<h1>Normal-heading is <strong>boring</strong></h1>
<h1 align="center">
Centered-heading is <strong>da wae</strong>
</h1><p><img src="https://image.src" /></p>
<p><img width="80%" src="https://image.src" /></p>
# Normal-heading is **boring**
<h1 align="center">
Centered-heading is <b>da wae</b>
</h1>![](https://image.src)
<img width="80%" src="https://image.src" />
> _**Note:** The HTML-switching is controlled by the rules' `Rule.toUseHtmlPredicate`._
But HTMLarkdown tries to use as **little HTML-syntax** as possible. **Mixing markdown and HTML** if needed:
Input HTML
Output Markdown
<blockquote>
<p align="center">
Centered-paragraph
</p>
<p>Below is a horizontal-rule in blockquote:</p>
<hr>
</blockquote>
> <p align="center">
> Centered-paragraph
> </p>
> Below is a horizontal-rule in blockquote:
>
> <hr>
Depending on the situation, HTMLarkdown will switch between markdown's **backslash-escaping** or **HTML-escaping**:
Input HTML
Output Markdown
<!-- In markdown -->
<p><TAG>, **NOT BOLD**</p><!-- In in-line HTML -->
<p>
<sup><TAG>, **NOT BOLD**</sup>
</p><!-- In block HTML -->
<p align="center">
<TAG>, **NOT BOLD**
</p>
\<TAG>, \*\*NOT BOLD\*\*
<sup>\<TAG>, \*\*NOT BOLD\*\*</sup>
<p align="center">
<TAG>, **NOT BOLD**
</p>
## Handling of edge cases
Adding separators in-between adjacent lists to prevent them from being combined by markdown-renderers:
Input HTML
Output Markdown
<ul>
<li>List 1 > item 1</li>
<li>List 1 > item 2</li>
</ul>
<ul>
<li>List 2 > item 1</li>
<li>List 2 > item 2</li>
</ul>
- List 1 > item 1
- List 1 > item 2<!-- LIST_SEPARATOR -->
- List 2 > item 1
- List 2 > item 2
And more!
But this section is getting too long so...
# Installation
```bash
npm install htmlarkdown
```
# Usage
## Markdown conversion _(either from `Element` or `string`)_
```js
import { HTMLarkdown } from 'htmlarkdown'/** Convert an element! */
const htmlarkdown = new HTMLarkdown()
const container = document.getElementById('container')
console.log(container.outerHTML)
// => ''Heading
htmlarkdown.convert(container)
// => '# Heading'/**
* Or a HTML string!
* Whichever u prefer. It's 2022, I don't judge :^)
*/
const htmlString = `Heading
Paragraph
`
const htmlStrWithContainer = `${htmlString}`
htmlarkdown.convert(htmlString)
// Set 2nd param 'hasContainer' to true, for container-wrapped string.
htmlarkdown.convert(htmlStrWithContainer, true)
// Both output => '# Heading\n\nParagraph'
```>
> **Note:** If an element is given to `convert`, it's deep-cloned before any processing/conversion.
> Thus, you don't have to worry about it mutating the original element :)
## Configuring
```js
/** Configure when creating an instance. */
const htmlarkdown = new HTMLarkdown({
htmlEscapingMode: '&<>',
maxPrettyTableWidth: Number.POSITIVE_INFINITY,
addTrailingLinebreak: true
})/** Or on an existing instance. */
htmlarkdown.options.maxPrettyTableWidth = -1
```
## Plugins
Plugins are of type `(htmlarkdown: HTMLarkdown): void`.
They take in a `HTMLarkdown` instance and configure it by **mutating** it.There's 2 plugin-options available in the `options` object: `preloadPlugins` and `plugins`.
The difference is:
- `preloadPlugins` loads the plugins **first**, before your other options. _(likes "presets")_
Allowing you to overwrite the plugins' changes:
```ts
const enableTrailingLinebreak: Plugin = (htmlarkdown) => {
htmlarkdown.options.addTrailingLinebreak = true
}
const htmlarkdown = new HTMLarkdown({
addTrailingLinebreak: false,
preloadPlugins: [enableTrailingLinebreak],
})
htmlarkdown.options.preloadPlugins // false
```
- `plugins` loads the plugins **after** your other options.
Meaning, plugins can overwrite your options.
```ts
const enableTrailingLinebreak: Plugin = (htmlarkdown) => {
htmlarkdown.options.addTrailingLinebreak = true
}
const htmlarkdown = new HTMLarkdown({
addTrailingLinebreak: false,
plugins: [enableTrailingLinebreak],
})
htmlarkdown.options.preloadPlugins // true
```
You can also load plugins on existing instances:
```js
htmlarkdown.loadPlugins([myPlugin])
```
## Making a copy of an instance
The conversion of a `HTMLarkdown` instance **solely** depends on its `options` property.
Meaning, you create a copy of an instance like this:```js
const htmlarkdown = new HTMLarkdown()
const copy = new HTMLarkdown(htmlarkdown.options)
```
## Configuring rules/processes
See [this section](#how-it-works) for info on what the rules/processes do.
```js
/**
* Overwriting default rules/processes.
* (does NOT include the defaults)
*/
const htmlarkdown = new HTMLarkdown({
preProcesses: [myPreProcess1, myPreProcess2],
rules: [myRule1, myRule2],
textProcesses: [myTextProcess1, myTextProcess2],
postProcesses: [myPostProcess1, myPostProcess2]
})/**
* Adding on to default rules/processes.
* (includes the defaults)
*/
const htmlarkdown = new HTMLarkdown()
htmlarkdown.addPreProcess(myPreProcess)
htmlarkdown.addRule(myRule)
htmlarkdown.addTextProcess(myTextProcess)
htmlarkdown.addPostProcess(myPostProcess)
```
# How it works
HTMLarkdown has 3 distinct phases:
1. **Pre-processing**
The container-element that's received _(and [deep-cloned](#deep-clone))_ by the `convert` method is passed consecutively to each `PreProcess` in `options.preProcesses`.2. **Conversion**
The pre-processed container-element is then recursively converted to markdown.
Elements are converted by `Rule` in `options.rules`.
Text-nodes are converted by `TextProcess` in `options.textProcesses`.
The rule/text-process outputs strings are then appended to each other, to give the raw markdown.3. **Post-processing**
The raw markdown string is then passed consecutively to each `PostProcess` in `options.postProcess`, to give the final markdown.
(image: the general conversion flow of HTMLarkdown)
# Contributing
## Bugs
HTMLarkdown is still under-development, so there'll likely be bugs.
So the easiest way to contribute is submit an issue _(with the `bug` label)_, especially for any incorrect markdown-conversions :)
For any incorrect markdown-conversions, state the:
- input HTML
- current incorrect markdown output
- expected markdown output
## New conversions, ideas, features, tests
If you have any new elements-conversions / ideas / features / tests that you think should be added, leave an issue with `feature` or `improve` label!
> - `feature` label is for new features
> - `improve` label is for improvements on existing features
>
> Understandably, there are gray areas on what is a "feature" and what is an "improvement". So just go with whichever seems more appropriate :)
## Other markdown specs
Currently, HTMLarkdown has been designed to output markdown for GitHub specifically _(ie. [GFM])_.
BUT, if there's another markdown spec. that you'd like to design for _(maybe as a plugin?)_, do leave an issue/discussion :D[GFM]: https://github.github.com/gfm/
## Coding-related stuff
Code-formatting is handled by [Prettier], so no need to worry bout it :)
Any new feature should
- be documented via TSDoc
- come with new unit-tests for them
- and should pass all new/existing testsAs for which merging method to use, check out the [discussion][merging-discussion].
[merging-discussion]: https://github.com/EvitanRelta/htmlarkdown/discussions/41
[Prettier]: https://prettier.io/
# Contributors
So far it's just me, so pls send help! :^)
# Roadmap
If you've any new ideas / features, check out the [Contributing section for it](#new-conversions-ideas-features-tests)!
## Element conversions
### Block-elements:
- [x] Headings _([For now][setext-issue], only [ATX-style][atx])_
- [x] Paragraph
- [x] Codeblock
- [x] Blockquote
- [x] Lists
_(ordered, unordered, [tight][tight] and [loose][loose])_
- [x] _([GFM][gfm-table])_ Table
- [ ] _([GFM][gfm-task-list])_ Task-list
_(Below are some planned block-elements that don't have markdown-equivalent)_
- [x] `` _(handled by a [noop-rule](#noop-rule))_
- [x] `` _([For now][div-noop-issue], handled by a [noop-rule](#noop-rule))_
- [ ] Definition list _(ie. ``, `
- `, `
- `)_
- [ ] Collapsible section _(ie. ``)_[setext-issue]: https://github.com/EvitanRelta/htmlarkdown/issues/36
[atx]: https://spec.commonmark.org/0.30/#atx-heading
[div-noop-issue]: https://github.com/EvitanRelta/htmlarkdown/issues/19
[tight]: https://github.github.com/gfm/#tight
[loose]: https://github.github.com/gfm/#loose
[gfm-table]: https://github.github.com/gfm/#tables-extension-
[gfm-task-list]: https://github.github.com/gfm/#task-list-items-extension-
### Text-formattings:
- [x] **Bold** _([For now][underscore-issue], only outputs in asterisks `**BOLD**`)_
- [x] _Italic_ _([For now][underscore-issue], only outputs in asterisks `*ITALIC*`)_
- [x] _([GFM][gfm-strikethrough])_ ~~Strikethrough~~
- [x] `Code`
- [x] [Link][secret] _([For now][ref-link-issue], only [inline links][inline-link])_
- [x] Superscript _(ie. ``)_
- [x] Subscript _(ie. ``)_
- [x] Underline _(ie. ``, ``)_
_(didn't know underlines possible till recently)_[underscore-issue]: https://github.com/EvitanRelta/htmlarkdown/issues/39
[gfm-strikethrough]: https://github.github.com/gfm/#strikethrough-extension-
[secret]: https://www.youtube.com/watch?v=dQw4w9WgXcQ
[ref-link-issue]: https://github.com/EvitanRelta/htmlarkdown/issues/38
[inline-link]: https://spec.commonmark.org/0.30/#inline-link
Misc:
- [x] Images _([For now][ref-link-issue], only [inline links][inline-link])_
- [x] Horizontal-rule _(ie. `
`)_
- [x] Linebreaks _(ie. ``)_
- [ ] Preserved HTML comments _([Issue \#25][preserve-comment-issue])_
_(eg. ``)_[preserve-comment-issue]: https://github.com/EvitanRelta/htmlarkdown/issues/25
Features to be added:
- Custom `id` attributes
```html
Go to [section with id](#my-section)
My section
```
- Reversing GitHub's Issue/PR autolinks
Input HTML
Output Markdown
<p>
Issue autolink:
<a href="https://github.com/user/repo/issues/7">#7</a>
</p>
Issue autolink: #7
- Ability to customise how codeblock's syntax-highlighting langauge is obtained from the `` elements
noop-rule
:
They only pass-on their converted inner-contents to their parents.
They themselves don't have any markdown conversions, not even in HTML-syntax.
# License
The MIT License (MIT).
So it's freeeeeee