Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/taufik-nurrohman/markdown

Obviously, a Markdown parser.
https://github.com/taufik-nurrohman/markdown
commonmark converter extra markdown parsedown parser php
Last synced: 3 months ago
JSON representation
Obviously, a Markdown parser.
Host: GitHub
URL: https://github.com/taufik-nurrohman/markdown
Owner: taufik-nurrohman
License: mit
Created: 2023-08-05T12:08:24.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-04-28T15:45:39.000Z (10 months ago)
Last Synced: 2024-05-02T00:23:31.792Z (10 months ago)
Topics: commonmark, converter, extra, markdown, parsedown, parser, php
Language: PHP
Homepage: https://github.com/mecha-cms/x.markdown
Size: 788 KB
Stars: 7
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

        PHP Markdown Parser

===================

![from.php] ![to.php]

[from.php]: https://img.shields.io/github/size/taufik-nurrohman/markdown/from.php?branch=main&color=%234f5d95&label=from.php&labelColor=%231f2328&style=flat-square

[to.php]: https://img.shields.io/github/size/taufik-nurrohman/markdown/to.php?branch=main&color=%234f5d95&label=to.php&labelColor=%231f2328&style=flat-square

With 90% compliance to [CommonMark 0.31.2](https://spec.commonmark.org/0.31.2) specifications.

Motivation

----------

  

  

  

I appreciate the [Parsedown](https://github.com/erusev/parsedown) project for its simplicity and speed. It uses only a

single class file to convert Markdown syntax to HTML. However, given the decrease in Parsedown project activity over

time, I assume that it is now in the state of “feature complete”. It still has some bugs to fix, and with

[the recent release of PHP version 8.1](https://www.php.net/releases/8.1/en.php), some of the PHP syntax there has

become obsolete.

There is actually [a draft for Parsedown version 2.0](https://github.com/erusev/parsedown/tree/2.0.x), but it is no

longer made as a single class file. It’s broken down into components. The goal, I think, is to make it easy to add

functionality without breaking what’s already in the core. For others, it may be of great use, but I see it as a form of

similarity to the features provided by

[CommonMark](https://github.com/thephpleague/commonmark/blob/2.4/docs/2.4/customization/extensions.md). Because of that,

if I want to update, it might be more optimal to just switch to CommonMark.

I’m not into things like that. As someone who needs a function to convert Markdown syntax to HTML, that kind of

flexibility is completely unnecessary to me. I just want to convert Markdown syntax to HTML for once and then move on.

It was fulfilled by [Parsedown version 1.8](https://github.com/erusev/parsedown/tree/1.8.x-beta), but it seems that it

is no longer being actively maintained.

The goal of this project is to use it in my [Markdown extension for Mecha](https://github.com/mecha-cms/x.markdown) in

the future. Previously, I wanted to develop this converter directly into the extension, but my friend advised me to

create this project separately as it might have potential to be used by other developers beyond the

[Mecha CMS](https://github.com/mecha-cms) developers.

Usage

-----

This converter can be installed using [Composer](https://packagist.org/packages/taufik-nurrohman/markdown), but it

doesn’t need any other dependencies and just uses Composer’s ability to automatically include files. Those of you who

don’t use Composer should be able to include the `from.php` and `to.php` files directly into your application without

any problems.

### Using Composer

From the command line interface, navigate to your project folder then run this command:

~~~ sh

composer require taufik-nurrohman/markdown

~~~

Require the generated auto-loader file in your application:

~~~ php

asdf'`

~~~

### Using File

Require the `from.php` and `to.php` files in your application:

~~~ php

asdf'`

~~~

The `to.php` file is optional and is used to convert HTML to Markdown. If you just want to convert Markdown to HTML, you

don’t need to include this file. This feature is experimental and is provided as a complementary feature, as there is

function `json_encode()` besides function `json_decode()`. The Markdown result may not satisfy everyone, but it can be

discussed further.

Options

-------

~~~ php

/**

 * Convert Markdown string to HTML string.

 *

 * @param null|string $value Your Markdown string.

 * @param bool $block If this option is set to `false`, Markdown block syntax will be ignored.

 * @return null|string

 */

from(?string $value, bool $block = true): ?string;

~~~

~~~ php

/**

 * Convert HTML string to Markdown string.

 *

 * @param null|string $value Your HTML string.

 * @param bool $block If this option is set to `false`, HTML block syntax will be stripped out.

 * @return null|string

 */

to(?string $value, bool $block = true): ?string;

~~~

Dialect

-------

From time to time, the history of Mecha slowly forms my Markdown writing style. The Markdown extension used by Mecha

[was first](https://github.com/mecha-cms/mecha/tree/v1.2.2) built with

[Michel Fortin’s Markdown converter](https://michelf.ca/projects/php-markdown) (which I believe is the very first port

of a PHP-based Markdown converter originally written in Perl by

[John Gruber](https://daringfireball.net/projects/markdown)). Until the release of

[Mecha version 1.2.3](https://github.com/mecha-cms/mecha/tree/v1.2.3), I decided to switch to

[Parsedown](https://github.com/erusev/parsedown) because it was quite popular at the time. It can also do the conversion

process much faster. Emanuil Rusev’s way of detecting the block type

[by reading the first character](https://github.com/erusev/parsedown/tree/1.7.4#questions) is, in my opinion, very

clever and efficient.

### Attributes

My Markdown converter supports a more extensive attribute syntax, including a mix of `.class` and `#id` attribute

syntax, and a mix of `key=value` attribute syntax:

  

    

      Markdown

      HTML

    

  

  

    

      
# asdf {#asdf}

      <h1 id="asdf">asdf</h1>

    

    

      # asdf {#asdf.asdf}

      <h1 class="asdf" id="asdf">asdf</h1>

    

    

      # asdf {#asdf.asdf asdf=asdf}

      <h1 asdf="asdf" class="asdf" id="asdf">asdf</h1>

    

  

Inline attributes always win over native syntax attributes and pre-defined attributes:

  

    

      Markdown

      HTML

    

  

  

    

      
[asdf](asdf) {href=x}

      <p><a href="x">asdf</a></p>

    

    

      [asdf]

[asdf]: asdf {href=x}

      <p><a href="x">asdf</a></p>

    

    

      [asdf] {.x href=x}

[asdf]: asdf {.asdf}

      <p><a class="x" href="x">asdf</a></p>

    

  

### Emphasis

CommonMark’s [emphasis (and strong emphasis) specifications][commonmark/em] almost drove me crazy! 🤯

Implementing that level of strictness would slow the project down even more towards a stable release. I actually

understand [the parsing strategy][commonmark/appendix] very well, but turning it into a minimal PHP code just feels so

hard for me. In order to speed up the completion of the project, I decided to reduce the strictness of the emphasis (and

strong emphasis) specifications.

They will not completely follow the CommonMark’s emphasis (and strong emphasis) specifications, but I promise that the

HTML results will still make sense, especially for those who have never read the specifications.

[commonmark/appendix]: https://spec.commonmark.org/0.31.2#appendix-a-parsing-strategy

[commonmark/em]: https://spec.commonmark.org/0.31.2#emphasis-and-strong-emphasis

**Rule 1:** The same type of emphasis can be nested only if one or both sides of the child emphasis begin and/or end

with white-space or punctuation.

This will create nested emphasis:

  

    

      Markdown

      HTML

    

  

  

    

      
*asdf *asdf* asdf*

      <p><em>asdf <em>asdf</em> asdf</em></p>

    

    

      **asdf* asdf asdf*

      <p><em><em>asdf</em> asdf asdf</em></p>

    

    

      *asdf asdf *asdf**

      <p><em>asdf asdf <em>asdf</em></em></p>

    

  

  

    

      Markdown

      HTML

    

  

  

    

      
**asdf **asdf** asdf**

      <p><strong>asdf <strong>asdf</strong> asdf</strong></p>

    

    

      ****asdf** asdf asdf**

      <p><strong><strong>asdf</strong> asdf asdf</strong></p>

    

    

      **asdf asdf **asdf****

      <p><strong>asdf asdf <strong>asdf</strong></strong></p>

    

  

This will not:

  

    

      Markdown

      HTML

    

  

  

    

      
*asdf*asdf*asdf*

      <p><em>asdf</em>asdf<em>asdf</em></p>

    

    

      **asdf*asdf asdf*

      <p>**asdf<em>asdf asdf</em></p>

    

    

      *asdf asdf*asdf**

      <p><em>asdf asdf</em>asdf**</p>

    

  

  

    

      Markdown

      HTML

    

  

  

    

      
**asdf**asdf**asdf**

      <p><strong>asdf</strong>asdf<strong>asdf</strong></p>

    

    

      ****asdf**asdf asdf**

      <p>****asdf<strong>asdf asdf</strong></p>

    

    

      **asdf asdf**asdf****

      <p><strong>asdf asdf</strong>asdf****</p>

    

  

**Rule 2:** For conditions where the emphasis types are different, **Rule 1** does not apply.

  

    

      Markdown

      HTML

    

  

  

    

      
*asdf**asdf**asdf*

      <p><em>asdf<strong>asdf</strong>asdf</em></p>

    

    

      *asdf **asdf** asdf*

      <p><em>asdf <strong>asdf</strong> asdf</em></p>

    

    

      ***asdf**asdf asdf*

      <p><em><strong>asdf</strong>asdf asdf</em></p>

    

    

      ***asdf** asdf asdf*

      <p><em><strong>asdf</strong> asdf asdf</em></p>

    

    

      *asdf asdf**asdf***

      <p><em>asdf asdf<strong>asdf</strong></em></p>

    

    

      *asdf asdf **asdf***

      <p><em>asdf asdf <strong>asdf</strong></em></p>

    

  

  

    

      Markdown

      HTML

    

  

  

    

      
**asdf*asdf*asdf**

      <p><strong>asdf<em>asdf</em>asdf</strong></p>

    

    

      **asdf *asdf* asdf**

      <p><strong>asdf <em>asdf</em> asdf</strong></p>

    

    

      ***asdf*asdf asdf**

      <p><strong><em>asdf</em>asdf asdf</strong></p>

    

    

      ***asdf* asdf asdf**

      <p><strong><em>asdf</em> asdf asdf</strong></p>

    

    

      **asdf asdf*asdf***

      <p><strong>asdf asdf<em>asdf</em></strong></p>

    

    

      **asdf asdf *asdf***

      <p><strong>asdf asdf <em>asdf</em></strong></p>

    

  

**Rule 3:** For conditions where the emphasis markers are different, **Rule 1** does not apply.

  

    

      Markdown

      HTML

    

  

  

    

      
_asdf*asdf*asdf_

      <p><em>asdf<em>asdf</em>asdf</em></p>

    

    

      *asdf_asdf_asdf*

      <p><em>asdf_asdf_asdf</em></p>

    

    

      *asdf _asdf_ asdf*

      <p><em>asdf <em>asdf</em> asdf</em></p>

    

    

      _*asdf*asdf asdf_

      <p><em><em>asdf</em>asdf asdf</em></p>

    

    

      *_asdf_asdf asdf*

      <p><em>_asdf_asdf asdf</em></p>

    

    

      *_asdf_ asdf asdf*

      <p><em><em>asdf</em> asdf asdf</em></p>

    

    

      _asdf asdf*asdf*_

      <p><em>asdf asdf<em>asdf</em></em></p>

    

    

      *asdf asdf_asdf_*

      <p><em>asdf asdf_asdf_</em></p>

    

    

      *asdf asdf _asdf_*

      <p><em>asdf asdf <em>asdf</em></em></p>

    

  

**Rule 4:** The opening delimiter must not be followed by a white-space and the closing delimiter must not be preceded

by a white-space in order for it to be a valid emphasis token.

  

    

      Markdown

      HTML

    

  

  

    

      
*asdf*

      <p><em>asdf</em></p>

    

    

      * asdf *

      <ul><li>asdf *</li></ul>

    

    

      * asdf*

      <ul><li>asdf*</li></ul>

    

    

      *asdf *

      <p>*asdf *</p>

    

  

**Rule 5:** The emphasis token cannot be empty.

  

    

      Markdown

      HTML

    

  

  

    

      
**

      <p>**</p>

    

    

      ****

      <hr />

    

  

### Links

Relative links and absolute links with the server’s host name will be treated as internal links, otherwise they will be

treated as external links and will automatically get `rel="nofollow"` and `target="_blank"` attributes.

### Notes

Notes follow the [Markdown Extra’s notes syntax](https://michelf.ca/projects/php-markdown/extra#footnotes) but with

slightly different HTML output to match [Mecha](https://github.com/mecha-cms)’s common naming style. Multi-line notes

don’t have to be indented by four spaces as required by Markdown Extra. A space or tab is enough to continue the note.

  

    

      Markdown

      HTML

    

  

  

    

      
asdf [^1]

[^1]: asdf

      <p>asdf <sup id="from:1"><a href="#to:1" role="doc-noteref">1</a></sup></p><div role="doc-endnotes"><hr /><ol><li id="to:1" role="doc-endnote"><p>asdf&#160;<a href="#from:1" role="doc-backlink">&#8617;</a></p></li></ol></div>

    

    

      asdf [^1]

[^1]:

  asdf

  ====

  asdf

  asdf

      asdf

  asdf

  asdf

asdf

      <p>asdf <sup id="from:1"><a href="#to:1" role="doc-noteref">1</a></sup></p><p>asdf</p><div role="doc-endnotes"><hr /><ol><li id="to:1" role="doc-endnote"><h1>asdf</h1><p>asdf asdf</p><pre><code>asdf</code></pre><p>asdf asdf&#160;<a href="#from:1" role="doc-backlink">&#8617;</a></p></li></ol></div>

    

  

### Soft Break

Soft breaks are collapsed to spaces in non-critical parts such as in paragraphs and list items:

  

    

      Markdown

      HTML

    

  

  

    

      
asdf asdf asdf asdf

asdf asdf asdf asdf

asdf asdf asdf asdf

      <p>asdf asdf asdf asdf asdf asdf asdf asdf</p><p>asdf asdf asdf asdf</p>

    

  

### Code Block

I try to avoid conflict between different Markdown dialects and try to support whatever dialect you are using. For

example, since I originally used Markdown Extra, I am used to adding info string with a dot prefix to the fenced code

block syntax. This is not supported by Parsedown (or rather, Parsedown doesn’t care about the pattern of the given info

string and simply appends `language-` prefix to it, since CommonMark also doesn’t give implementors special rules for

processing info string in fenced code block syntax).

Here’s how the code block results compare across each Markdown converter:

#### Markdown Extra

  

    

      Markdown

      HTML

    

  

  

    

      
~~~ asdf

asdf

~~~

      <pre><code class="asdf">asdf

</code></pre>

    

    

      ~~~ .asdf

asdf

~~~

      <pre><code class="asdf">asdf

</code></pre>

    

    

      ~~~ asdf asdf

asdf

~~~

      Invalid.

    

    

      ~~~ .asdf.asdf

asdf

~~~

      Invalid.

    

    

      ~~~ {#asdf.asdf}

asdf

~~~

      <pre><code class="asdf" id="asdf">asdf

</code></pre>

    

    

      ~~~ {#asdf.asdf asdf=asdf}

asdf

~~~

      Invalid.

    

  

#### Parsedown Extra

  

    

      Markdown

      HTML

    

  

  

    

      
~~~ asdf

asdf

~~~

      <pre><code class="language-asdf">asdf</code></pre>

    

    

      ~~~ .asdf

asdf

~~~

      <pre><code class="language-.asdf">asdf</code></pre>

    

    

      ~~~ asdf asdf

asdf

~~~

      <pre><code class="language-asdf">asdf</code></pre>

    

    

      ~~~ .asdf.asdf

asdf

~~~

      <pre><code class="language-.asdf.asdf">asdf</code></pre>

    

    

      ~~~ {#asdf.asdf}

asdf

~~~

      <pre><code class="language-{#asdf.asdf}">asdf</code></pre>

    

    

      ~~~ {#asdf.asdf asdf=asdf}

asdf

~~~

      <pre><code class="language-{#asdf.asdf">asdf</code></pre>

  

#### Mine

  

    

      Markdown

      HTML

    

  

  

    

      
~~~ asdf

asdf

~~~

      <pre><code class="language-asdf">asdf</code></pre>

    

    

      ~~~ .asdf

asdf

~~~

      <pre><code class="asdf">asdf</code></pre>

    

    

      ~~~ asdf asdf

asdf

~~~

      <pre><code class="language-asdf">asdf</code></pre>

    

    

      ~~~ .asdf.asdf

asdf

~~~

      <pre><code class="asdf">asdf</code></pre>

    

    

      ~~~ {#asdf.asdf}

asdf

~~~

      <pre><code class="asdf" id="asdf">asdf</code></pre>

    

    

      ~~~ {#asdf.asdf asdf=asdf}

asdf

~~~

      <pre><code asdf="asdf" class="asdf" id="asdf">asdf</code></pre>

  

### HTML Block

CommonMark doesn’t care about the DOM and therefore also doesn’t care if a HTML element is perfectly balanced or not.

Unlike the original Markdown syntax specification which doesn’t allow you to convert Markdown syntax inside a HTML

block, the CommonMark specification doesn’t limit such a case. It cares about blank lines around the lines that look

like a HTML block tag, as specified in [Section 4.6](https://spec.commonmark.org/0.31.2#html-blocks), type 6.

Any text that comes after the opening and/or closing of a HTML block is treated as raw text and is not processed as

Markdown syntax. A blank line is required to end the raw HTML block state:

  

    

      Markdown

      HTML

    

  

  

    

      
<div> asdf asdf *asdf* asdf

</div> asdf asdf *asdf* asdf

      <div> asdf asdf *asdf* asdf

</div> asdf asdf *asdf* asdf

    

    

      <div>

asdf asdf *asdf* asdf

</div>

asdf asdf *asdf* asdf

      <div>

asdf asdf *asdf* asdf</div>

asdf asdf *asdf* asdf

    

    

      <div>

asdf asdf *asdf* asdf

</div>

asdf asdf *asdf* asdf

      <div><p>asdf asdf <em>asdf</em> asdf</p></div><p>asdf asdf <em>asdf</em> asdf</p>

    

  

Exception for types 1, 2, 3, 4, and 5. A line break is enough to end the raw HTML block state:

  

    

      Markdown

      HTML

    

  

  

    

      
<!-- asdf asdf *asdf* asdf --> asdf asdf *asdf* asdf

      <!-- asdf asdf *asdf* asdf --> asdf asdf *asdf* asdf

    

    

      <!-- asdf asdf *asdf* asdf -->

asdf asdf *asdf* asdf

      <!-- asdf asdf *asdf* asdf --><p>asdf asdf <em>asdf</em> asdf</p>

    

    

      <!-- asdf asdf *asdf* asdf -->

asdf asdf *asdf* asdf

      <!-- asdf asdf *asdf* asdf --><p>asdf asdf <em>asdf</em> asdf</p>

    

  

The examples below will generate a predictable HTML code, but not because this converter cares about the existing HTML

tag balance:

  

    

      Markdown

      HTML

    

  

  

    

      
<nav>

<ul>

<li>

<a>asdf</a>

</li>

<li>

<a>asdf</a>

</li>

<li>

<a>asdf</a>

</li>

</ul>

</nav>

asdf asdf *asdf* asdf

      <nav>

<ul>

<li>

<a>asdf</a>

</li>

<li>

<a>asdf</a>

</li>

<li>

<a>asdf</a>

</li>

</ul>

</nav><p>asdf asdf <em>asdf</em> asdf</p>

    

    

      <nav>

  <ul>

    <li>

      <a>asdf</a>

    </li>

    <li>

      <a>asdf</a>

    </li>

    <li>

      <a>asdf</a>

    </li>

  </ul>

</nav>

asdf asdf *asdf* asdf

      <nav>

  <ul>

    <li>

      <a>asdf</a>

    </li>

    <li>

      <a>asdf</a>

    </li>

    <li>

      <a>asdf</a>

    </li>

  </ul>

</nav><p>asdf asdf <em>asdf</em> asdf</p>

    

  

You will understand why when you add a number of blank lines at any point in the HTML block:

  

    

      Markdown

      HTML

    

  

  

    

      
<nav>

<ul>

<li>

<a>

asdf</a>

</li>

<li>

<a>asdf</a>

</li>

<li>

<a>asdf</a>

</li>

</ul>

</nav>

asdf asdf *asdf* asdf

      <nav>

<ul>

<li>

<a><p>asdf</a></p></li><li>

<a>asdf</a>

</li>

<li>

<a>asdf</a>

</li>

</ul>

</nav><p>asdf asdf <em>asdf</em> asdf</p>

    

    

      <nav>

  <ul>

    <li>

      <a>

      asdf</a>

    </li>

    <li>

      <a>asdf</a>

    </li>

    <li>

      <a>asdf</a>

    </li>

  </ul>

</nav>

asdf asdf *asdf* asdf

      <nav>

  <ul>

    <li>

      <a><pre><code>  asdf&lt;/a&gt;

&lt;/li&gt;

&lt;li&gt;

  &lt;a&gt;asdf&lt;/a&gt;

&lt;/li&gt;

&lt;li&gt;

  &lt;a&gt;asdf&lt;/a&gt;

&lt;/li&gt;</code></pre></ul>

</nav><p>asdf asdf <em>asdf</em> asdf</p>

    

  

Markdown Extra features the `markdown` attribute on HTML to allow you to convert Markdown syntax to HTML in a HTML

block. In this converter, the feature will not work. For now, I have no plans to add such feature to avoid DOM parsing

tasks as much as possible. This also ensured me to avoid on using [PHP `dom`](https://www.php.net/book.dom).

However, if you add a blank line, it’s as if the feature works (although the `markdown` attribute is still there, it

doesn’t affect the HTML when rendered in the browser window). If you’re used to adding a blank line after the opening

HTML block tag and before the closing HTML block tag, you should be okay.

  

    

      Markdown

      HTML

    

  

  

    

      
<div markdown="1">

asdf asdf *asdf* asdf

</div>

      <div markdown="1">

asdf asdf *asdf* asdf

</div>

    

    

      <div markdown="1">

asdf asdf *asdf* asdf

</div>

      <div markdown="1"><p>asdf asdf <em>asdf</em> asdf</p></div>

    

  

Opening an inline HTML element will not trigger the raw HTML block state unless the opening and closing tags stand alone

on a single line. This is explained in [Section 4.6](https://spec.commonmark.org/0.31.2#html-blocks), type 7:

  

    

      Markdown

      HTML

    

  

  

    

      
<span>asdf *asdf*</span> asdf *asdf* asdf

      <p><span>asdf <em>asdf</em></span> asdf <em>asdf</em> asdf</p>

    

    

      <span>

asdf *asdf*

</span>

asdf *asdf* asdf

      <span>

asdf *asdf*

</span>

asdf *asdf* asdf

    

  

Since CommonMark doesn’t care about HTML structure, the examples below will also conform to the specification, even if

they result in broken HTML. However, these are very rarely intentionally written by hand, so such cases are very

unlikely to occur:

  

    

      Markdown

      HTML

    

  

  

    

      
<h1>

asdf asdf *asdf* asdf

</h1>

      <h1><p>asdf asdf <em>asdf</em> asdf</p></h1>

    

    

      <p>

asdf asdf *asdf* asdf

</p>

      <p><p>asdf asdf <em>asdf</em> asdf</p></p>

    

  

### Image Block

Markdown was initiated before the HTML5 era. When the `` element was introduced, people started using it as a

feature to display an image with a caption. Most Markdown converters will convert image syntax that stands alone on a

single line as an image element wrapped in a paragraph element in the output. My converter would instead wrap it in a

figure element. Because for now, it seems like a figure element would be more desirable in this situation.

Paragraphs that appear below it will be taken as the image caption if you prepend a number of spaces less than 4.

  

    

      Markdown

      HTML

    

  

  

    

      
![asdf](asdf.jpg)

      <figure><img alt="asdf" src="asdf.jpg" /></figure>

    

    

      ![asdf](asdf.jpg)

 asdf

      <figure><img alt="asdf" src="asdf.jpg" /><figcaption>asdf</figcaption></figure>

    

    

      ![asdf](asdf.jpg)

 asdf

 asdf

asdf

      <figure><img alt="asdf" src="asdf.jpg" /><figcaption><p>asdf</p><p>asdf</p></figcaption></figure><p>asdf</p>

    

    

      ![asdf](asdf.jpg) asdf

      <p><img alt="asdf" src="asdf.jpg" /> asdf</p>

    

  

FYI, this pattern should also be valid for average Markdown files. And so it will be gracefully degraded when parsed by

other Markdown converters.

### List Block

List blocks follow the CommonMark specifications with one exception: if the next ordered list item uses a number that is

less than the number of the previous ordered list item, a new list block will be created. This is different from the

original specification, which does not care about the literal value of the number.

  

    

      Markdown

      HTML

    

  

  

    

      
1. asdf

2. asdf

3. asdf

      <ol><li>asdf</li><li>asdf</li><li>asdf</li></ol>

    

    

      1. asdf

1. asdf

1. asdf

      <ol><li>asdf</li><li>asdf</li><li>asdf</li></ol>

    

    

      1. asdf

2. asdf

1. asdf

      <ol><li>asdf</li><li>asdf</li></ol><ol><li>asdf</li></ol>

    

  

### Table Block

Table blocks follow the [Markdown Extra’s table block syntax](https://michelf.ca/projects/php-markdown/extra#table).

However, there are a few additional features and rules:

 - The actual number of columns follows the number of columns in the table header separator. If you have columns in

   table header and/or table data with a number that exceeds the actual number of columns, the excess columns will be

   discarded. If you have columns in table header and/or table data with a number that is less than the actual number of

   columns, several empty columns will be added automatically to the right side.

 - Literal pipe characters in table columns must be escaped. Exceptions are those that appear in code span and attribute

   values of raw HTML tags.

 - Header-less table is supported, but may not be compatible with other Markdown converters. Consider using this feature

   as rarely as possible, unless you have no plans to switch to other Markdown converters in the future.

 - Table caption is supported and can be created using the same syntax as the image block’s caption syntax.

  

    

      Markdown

      HTML

    

  

  

    

      
asdf | asdf

---- | ----

asdf | asdf

      <table><thead><tr><th>asdf</th><th>asdf</th></tr></thead><tbody><tr><td>asdf</td><td>asdf</td></tr></tbody></table>

    

    

      asdf | asdf

---- | ----

      <table><thead><tr><th>asdf</th><th>asdf</th></tr></thead></table>

    

    

      ---- | ----

asdf | asdf

      <table><tbody><tr><td>asdf</td><td>asdf</td></tr></tbody></table>

    

  

XSS

---

This converter is intended only to convert Markdown syntax to HTML based on the

[CommonMark](https://spec.commonmark.org/0.31.2) specification. It doesn’t care about your user input. I have no

intention of adding any special security features in the future, sorry. The attribute syntax feature may be a security

risk for you if you want to use this converter on your comment entries, for example:

  

    

      Markdown

      HTML

    

  

  

    

      
![asdf](asdf.asdf) {onerror="alert('Yo!')"}

      <img alt="asdf" onerror="alert(&apos;Yo!&apos;)" src="asdf.asdf" />

    

  

There should be many specialized PHP applications already that have specific tasks to deal with XSS, so consider

post-processing the generated HTML markup before putting it out to the web:

 - [ezyang/htmlpurifier](https://github.com/ezyang/htmlpurifier)

 - [voku/anti-xss](https://github.com/voku/anti-xss)

Tests

-----

Clone this repository into the root of your web server that supports PHP and then you can open the `test/from.php` and

`test/to.php` file with your browser to see the result and the performance of this converter in various cases.

Tweaks

------

Not all Markdown dialects are supported for various reasons. Some of the modification methods below can be implemented

to add features that you might find in other Markdown converters.

Your Markdown content is represented as variable `$value`. If you modify the content before the function

`from_markdown()` is called, it means that you modify the Markdown content before it is converted. If you modify the

content after the function `from_markdown()` is called, it means that you modify the results of the Markdown conversion.

### Globally Reusable Functions

To make `from_markdown()` and `to_markdown()` functions reusable globally, use this method:

~~~ php

'` with `'>'` directly from the results of the Markdown conversion:

~~~ php

$value = from_markdown($value);

$value = strtr($value, [' />' => '>']);

echo $value;

~~~

### Strike

This method allows you to add strike-through syntax, as you may have already noticed in the

[GFM specification](https://github.github.com/gfm):

~~~ php

$value = from_markdown($value);

$value = preg_replace('/((?$2', $value);

echo $value;

~~~

### Task List

I am against the task list feature because it promotes bad practices to abuse the form input element. Although from the

presentation side it displays a check box interface correctly, I still believe that input elements should ideally be

used inside a form element. There are several Unicode symbols that are more suitable and easier to read from the

Markdown source like ☐ and ☒, which means that this feature can actually be made using the existing list

feature:

~~~ md

- ☒ asdf

- ☐ asdf

- ☐ asdf

~~~

In case you need it, or don’t want to update your existing task list syntax in your Markdown files, here’s the hack:

~~~ php

$value = from_markdown($value);

$value = strtr($value, [

    '
[ ] ' => '
☐ ',

    '
[x] ' => '
☒ ',

    '
[ ] ' => '
☐ ',

    '
[x] ' => '
☒ '

]);

echo $value;

~~~

### Pre-Defined Abbreviations, Notes, and References

By inserting abbreviations, notes, and references at the end of the Markdown content, it will be as if you had

pre-defined abbreviations, notes, and references feature. This should be placed at the end of the Markdown content,

because according to the [link reference definitions](https://spec.commonmark.org/0.31.2#example-204) specification, the

first declared reference always takes precedence:

~~~ php

$abbreviations = [

    'CSS' => 'Cascading Style Sheet',

    'HTML' => 'Hyper Text Markup Language',

    'JS' => 'JavaScript'

];

$references = [

    'mecha-cms' => ['https://github.com/mecha-cms', 'Mecha CMS', []],

    'taufik-nurrohman' => ['https://github.com/taufik-nurrohman', 'Taufik Nurrohman', []],

];

$suffix = "";

if (!empty($abbreviations)) {

    foreach ($abbreviations as $k => $v) {

        $k = strtr($k, [

            '[' => '\[',

            ']' => '\]'

        ]);

        $v = trim(preg_replace('/\s+/', ' ', $v));

        $suffix .= "\n*[" . $k . ']: ' . $v;

    }

}

if (!empty($references)) {

    foreach ($references as $k => $v) {

        [$link, $title, $attributes] = $v;

        $k = strtr($k, [

            '[' => '\[',

            ']' => '\]'

        ]);

        if ("" === $link || false !== strpos($link, ' ')) {

            $link = '<' . $link . '>';

        }

        $reference = '[' . $k . ']: ' . $link;

        if (!empty($title)) {

            $reference .= " '" . strtr($title, ["'" => "\\'"]) . "'";

        }

        if (!empty($attributes)) {

            foreach ($attributes as $kk => &$vv) {

                // `{.asdf}`

                if ('class' === $kk) {

                    $vv = '.' . trim(preg_replace('/\s+/', '.', $vv));

                    continue;

                }

                // `{#asdf}`

                if ('id' === $kk) {

                    $vv = '#' . $vv;

                    continue;

                }

                // `{asdf}`

                if (true === $vv) {

                    $vv = $kk;

                    continue;

                }

                // `{asdf=""}`

                if ("" === $vv) {

                    $vv = $kk . '=""';

                    continue;

                }

                // `{asdf='asdf'}`

                $vv = $kk . "='" . strtr($vv, ["'" => "\\'"]) . "'";

            }

            unset($vv);

            sort($attributes);

            $attributes = trim(strtr(implode(' ', $attributes), [

                ' #' => '#',

                ' .' => '.'

            ]));

            $reference .= ' {' . $attributes . '}';

        }

        $suffix .= "\n" . $reference;

    }

}

$value = from_markdown($value . "\n" . $suffix);

echo $value;

~~~

### Pre-Defined Header’s ID

Add an automatic `id` attribute to headers level 2 through 6 if it’s not set, and then prepend an anchor element that

points to it:

~~~ php

$value = from_markdown($value);

if ($value && false !== strpos($value, '"[^"]*"|\'[^\']*\'|[^>])*)?>([\s\S]+?)<\/\1>/', static function ($m) {

        if (!empty($m[2]) && false !== strpos($m[2], 'id=') && preg_match('/\bid=("[^"]+"|\'[^\']+\'|[^\/>\s]+)/', $m[2], $n)) {

            if ('"' === $n[1][0] && '"' === substr($n[1], -1)) {

                $id = substr($n[1], 1, -1);

            } else if ("'" === $n[1][0] && "'" === substr($n[1], -1)) {

                $id = substr($n[1], 1, -1);

            } else {

                $id = $n[1];

            }

            $m[3] = '⚓ ' . $m[3];

            return '<' . $m[1] . $m[2] . '>' . $m[3] . '' . $m[1] . '>';

        }

        $id = trim(preg_replace('/[^a-z\x{4e00}-\x{9fa5}\d]+/u', '-', strtolower($m[3])), '-');

        $m[3] = '⚓ ' . $m[3];

        return '<' . $m[1] . ($m[2] ?? "") . ' id="' . htmlspecialchars($id) . '">' . $m[3] . '' . $m[1] . '>';

    }, $value);

}

echo $value;

~~~

### Idea: Embed Syntax

The [CommonMark specification for automatic links](https://spec.commonmark.org/0.31.2#autolinks) doesn’t limit specific

types of URL protocols. It just specifies the pattern so we can take advantage of the automatic link syntax to render it

as a kind of “embed” syntax, which you can then turn it into a chunk of HTML elements.

I’m sure this idea has never been done before and that’s why I want to be the first to mention it. But I’m not going to

integrate this feature directly into my converter to keep it slim. I just want to give you a couple of ideas.

Be aware that these tweaks are very naive, as they will directly convert the “embed” syntax without taking the block

type into account. You may need to use [this filter](https://github.com/taufik-nurrohman/markdown-filter) to replace the

“embed” syntax only in certain block types, e.g. to ignore the “embed” syntax inside a fenced code block syntax.

#### YouTube Video Embed

An embed syntax to display a YouTube video by video ID.

~~~ md

~~~

~~~ php

$value = preg_replace('/^[ ]{0,3}]+)>\s*$/m', '', $value);

$value = from_markdown($value);

echo $value;

~~~

#### GitHub Gist Embed

An embed syntax to display a GitHub gist by gist ID.

~~~ md

~~~

~~~ php

$value = preg_replace('/^[ ]{0,3}]+)>\s*$/m', '', $value);

$value = from_markdown($value);

echo $value;

~~~

#### Form Embed

An embed syntax to display a HTML form that was generated from the server side with a reference ID of `18a4596d42c` and

a `title` parameter to customize the HTML form title.

~~~ md

~~~

~~~ php

$value = preg_replace_callback('/^[ ]{0,3}?]+)([?][^#>]*)?([#][^>]*)?>\s*$/m', static function ($m) {

    $path = $m[1];

    $value = "";

    parse_str(substr($m[2] ?? "", 1), $state);

    $value .= '';

    if (!empty($state['title'])) {

        $value .= '
' . $state['title'] . '';

    }

    // … etc.

    // Be careful not to include blank line(s), or the raw HTML block state will end before the HTML form is complete!

    $value .= '';

    return $value;

}, $value);

$value = from_markdown($value);

echo $value;

~~~

### Idea: Note Block

Several people have discussed this feature, and I think I like

[this answer](https://stackoverflow.com/a/41449789/1163000) the most. The syntax is compatible with native Markdown

syntax, which is nice to look at directly through the Markdown source, even when it gets rendered to HTML:

~~~ md

------------------------------

  **NOTE:** asdf asdf asdf

------------------------------

~~~

~~~ md

------------------------------

  **NOTE:**

  asdf asdf asdf asdf

  asdf asdf asdf asdf

  asdf asdf asdf asdf

------------------------------

~~~

Most Markdown converters will render the syntax above to this HTML, which is still acceptable to be treated as a note

block from its presentation, despite its broken semantic:

~~~ html



NOTE: asdf asdf asdf



~~~

~~~ html



NOTE:

asdf asdf asdf asdf asdf asdf asdf asdf

asdf asdf asdf asdf



~~~

With regular expressions, you can improve its [semantic](https://w3c.github.io/aria#note):

~~~ php

$value = from_markdown($value);

$value = preg_replace_callback('/
(NOTE:<\/strong>[\s\S]*?<\/p>)
/', static function ($m) {

    return '' . $m[1] . '';

}, $value);

echo $value;

~~~

License

-------

This library is licensed under the [MIT License](LICENSE). Please consider

[donating 💰](https://github.com/sponsors/taufik-nurrohman) if you benefit financially from this library.

Links

-----

 - Autumn image sample by [@blmiers2](https://www.flickr.com/photos/41304517@N00/6250498399)

 - Emoticon image sample by [@emoticons4u](https://web.archive.org/web/20090117060451/http://emoticons4u.com) (web archive)