Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/taufik-nurrohman/markdown
Obviously, a Markdown parser.
https://github.com/taufik-nurrohman/markdown
commonmark converter extra markdown parsedown parser php
Last synced: 3 months ago
JSON representation
Obviously, a Markdown parser.
- Host: GitHub
- URL: https://github.com/taufik-nurrohman/markdown
- Owner: taufik-nurrohman
- License: mit
- Created: 2023-08-05T12:08:24.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-28T15:45:39.000Z (10 months ago)
- Last Synced: 2024-05-02T00:23:31.792Z (10 months ago)
- Topics: commonmark, converter, extra, markdown, parsedown, parser, php
- Language: PHP
- Homepage: https://github.com/mecha-cms/x.markdown
- Size: 788 KB
- Stars: 7
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
PHP Markdown Parser
===================![from.php] ![to.php]
[from.php]: https://img.shields.io/github/size/taufik-nurrohman/markdown/from.php?branch=main&color=%234f5d95&label=from.php&labelColor=%231f2328&style=flat-square
[to.php]: https://img.shields.io/github/size/taufik-nurrohman/markdown/to.php?branch=main&color=%234f5d95&label=to.php&labelColor=%231f2328&style=flat-squareWith 90% compliance to [CommonMark 0.31.2](https://spec.commonmark.org/0.31.2) specifications.
Motivation
----------
I appreciate the [Parsedown](https://github.com/erusev/parsedown) project for its simplicity and speed. It uses only a
single class file to convert Markdown syntax to HTML. However, given the decrease in Parsedown project activity over
time, I assume that it is now in the state of “feature complete”. It still has some bugs to fix, and with
[the recent release of PHP version 8.1](https://www.php.net/releases/8.1/en.php), some of the PHP syntax there has
become obsolete.There is actually [a draft for Parsedown version 2.0](https://github.com/erusev/parsedown/tree/2.0.x), but it is no
longer made as a single class file. It’s broken down into components. The goal, I think, is to make it easy to add
functionality without breaking what’s already in the core. For others, it may be of great use, but I see it as a form of
similarity to the features provided by
[CommonMark](https://github.com/thephpleague/commonmark/blob/2.4/docs/2.4/customization/extensions.md). Because of that,
if I want to update, it might be more optimal to just switch to CommonMark.I’m not into things like that. As someone who needs a function to convert Markdown syntax to HTML, that kind of
flexibility is completely unnecessary to me. I just want to convert Markdown syntax to HTML for once and then move on.
It was fulfilled by [Parsedown version 1.8](https://github.com/erusev/parsedown/tree/1.8.x-beta), but it seems that it
is no longer being actively maintained.The goal of this project is to use it in my [Markdown extension for Mecha](https://github.com/mecha-cms/x.markdown) in
the future. Previously, I wanted to develop this converter directly into the extension, but my friend advised me to
create this project separately as it might have potential to be used by other developers beyond the
[Mecha CMS](https://github.com/mecha-cms) developers.Usage
-----This converter can be installed using [Composer](https://packagist.org/packages/taufik-nurrohman/markdown), but it
doesn’t need any other dependencies and just uses Composer’s ability to automatically include files. Those of you who
don’t use Composer should be able to include the `from.php` and `to.php` files directly into your application without
any problems.### Using Composer
From the command line interface, navigate to your project folder then run this command:
~~~ sh
composer require taufik-nurrohman/markdown
~~~Require the generated auto-loader file in your application:
~~~ php
asdf'`
~~~### Using File
Require the `from.php` and `to.php` files in your application:
~~~ php
asdf'`
~~~The `to.php` file is optional and is used to convert HTML to Markdown. If you just want to convert Markdown to HTML, you
don’t need to include this file. This feature is experimental and is provided as a complementary feature, as there is
function `json_encode()` besides function `json_decode()`. The Markdown result may not satisfy everyone, but it can be
discussed further.Options
-------~~~ php
/**
* Convert Markdown string to HTML string.
*
* @param null|string $value Your Markdown string.
* @param bool $block If this option is set to `false`, Markdown block syntax will be ignored.
* @return null|string
*/
from(?string $value, bool $block = true): ?string;
~~~~~~ php
/**
* Convert HTML string to Markdown string.
*
* @param null|string $value Your HTML string.
* @param bool $block If this option is set to `false`, HTML block syntax will be stripped out.
* @return null|string
*/
to(?string $value, bool $block = true): ?string;
~~~Dialect
-------From time to time, the history of Mecha slowly forms my Markdown writing style. The Markdown extension used by Mecha
[was first](https://github.com/mecha-cms/mecha/tree/v1.2.2) built with
[Michel Fortin’s Markdown converter](https://michelf.ca/projects/php-markdown) (which I believe is the very first port
of a PHP-based Markdown converter originally written in Perl by
[John Gruber](https://daringfireball.net/projects/markdown)). Until the release of
[Mecha version 1.2.3](https://github.com/mecha-cms/mecha/tree/v1.2.3), I decided to switch to
[Parsedown](https://github.com/erusev/parsedown) because it was quite popular at the time. It can also do the conversion
process much faster. Emanuil Rusev’s way of detecting the block type
[by reading the first character](https://github.com/erusev/parsedown/tree/1.7.4#questions) is, in my opinion, very
clever and efficient.### Attributes
My Markdown converter supports a more extensive attribute syntax, including a mix of `.class` and `#id` attribute
syntax, and a mix of `key=value` attribute syntax:
Markdown
HTML
# asdf {#asdf}
<h1 id="asdf">asdf</h1>
# asdf {#asdf.asdf}
<h1 class="asdf" id="asdf">asdf</h1>
# asdf {#asdf.asdf asdf=asdf}
<h1 asdf="asdf" class="asdf" id="asdf">asdf</h1>
Inline attributes always win over native syntax attributes and pre-defined attributes:
Markdown
HTML
[asdf](asdf) {href=x}
<p><a href="x">asdf</a></p>
[asdf]
[asdf]: asdf {href=x}
<p><a href="x">asdf</a></p>
[asdf] {.x href=x}
[asdf]: asdf {.asdf}
<p><a class="x" href="x">asdf</a></p>
### Emphasis
CommonMark’s [emphasis (and strong emphasis) specifications][commonmark/em] almost drove me crazy! 🤯
Implementing that level of strictness would slow the project down even more towards a stable release. I actually
understand [the parsing strategy][commonmark/appendix] very well, but turning it into a minimal PHP code just feels so
hard for me. In order to speed up the completion of the project, I decided to reduce the strictness of the emphasis (and
strong emphasis) specifications.They will not completely follow the CommonMark’s emphasis (and strong emphasis) specifications, but I promise that the
HTML results will still make sense, especially for those who have never read the specifications.[commonmark/appendix]: https://spec.commonmark.org/0.31.2#appendix-a-parsing-strategy
[commonmark/em]: https://spec.commonmark.org/0.31.2#emphasis-and-strong-emphasis**Rule 1:** The same type of emphasis can be nested only if one or both sides of the child emphasis begin and/or end
with white-space or punctuation.This will create nested emphasis:
Markdown
HTML
*asdf *asdf* asdf*
<p><em>asdf <em>asdf</em> asdf</em></p>
**asdf* asdf asdf*
<p><em><em>asdf</em> asdf asdf</em></p>
*asdf asdf *asdf**
<p><em>asdf asdf <em>asdf</em></em></p>
Markdown
HTML
**asdf **asdf** asdf**
<p><strong>asdf <strong>asdf</strong> asdf</strong></p>
****asdf** asdf asdf**
<p><strong><strong>asdf</strong> asdf asdf</strong></p>
**asdf asdf **asdf****
<p><strong>asdf asdf <strong>asdf</strong></strong></p>
This will not:
Markdown
HTML
*asdf*asdf*asdf*
<p><em>asdf</em>asdf<em>asdf</em></p>
**asdf*asdf asdf*
<p>**asdf<em>asdf asdf</em></p>
*asdf asdf*asdf**
<p><em>asdf asdf</em>asdf**</p>
Markdown
HTML
**asdf**asdf**asdf**
<p><strong>asdf</strong>asdf<strong>asdf</strong></p>
****asdf**asdf asdf**
<p>****asdf<strong>asdf asdf</strong></p>
**asdf asdf**asdf****
<p><strong>asdf asdf</strong>asdf****</p>
**Rule 2:** For conditions where the emphasis types are different, **Rule 1** does not apply.
Markdown
HTML
*asdf**asdf**asdf*
<p><em>asdf<strong>asdf</strong>asdf</em></p>
*asdf **asdf** asdf*
<p><em>asdf <strong>asdf</strong> asdf</em></p>
***asdf**asdf asdf*
<p><em><strong>asdf</strong>asdf asdf</em></p>
***asdf** asdf asdf*
<p><em><strong>asdf</strong> asdf asdf</em></p>
*asdf asdf**asdf***
<p><em>asdf asdf<strong>asdf</strong></em></p>
*asdf asdf **asdf***
<p><em>asdf asdf <strong>asdf</strong></em></p>
Markdown
HTML
**asdf*asdf*asdf**
<p><strong>asdf<em>asdf</em>asdf</strong></p>
**asdf *asdf* asdf**
<p><strong>asdf <em>asdf</em> asdf</strong></p>
***asdf*asdf asdf**
<p><strong><em>asdf</em>asdf asdf</strong></p>
***asdf* asdf asdf**
<p><strong><em>asdf</em> asdf asdf</strong></p>
**asdf asdf*asdf***
<p><strong>asdf asdf<em>asdf</em></strong></p>
**asdf asdf *asdf***
<p><strong>asdf asdf <em>asdf</em></strong></p>
**Rule 3:** For conditions where the emphasis markers are different, **Rule 1** does not apply.
Markdown
HTML
_asdf*asdf*asdf_
<p><em>asdf<em>asdf</em>asdf</em></p>
*asdf_asdf_asdf*
<p><em>asdf_asdf_asdf</em></p>
*asdf _asdf_ asdf*
<p><em>asdf <em>asdf</em> asdf</em></p>
_*asdf*asdf asdf_
<p><em><em>asdf</em>asdf asdf</em></p>
*_asdf_asdf asdf*
<p><em>_asdf_asdf asdf</em></p>
*_asdf_ asdf asdf*
<p><em><em>asdf</em> asdf asdf</em></p>
_asdf asdf*asdf*_
<p><em>asdf asdf<em>asdf</em></em></p>
*asdf asdf_asdf_*
<p><em>asdf asdf_asdf_</em></p>
*asdf asdf _asdf_*
<p><em>asdf asdf <em>asdf</em></em></p>
**Rule 4:** The opening delimiter must not be followed by a white-space and the closing delimiter must not be preceded
by a white-space in order for it to be a valid emphasis token.
Markdown
HTML
*asdf*
<p><em>asdf</em></p>
* asdf *
<ul><li>asdf *</li></ul>
* asdf*
<ul><li>asdf*</li></ul>
*asdf *
<p>*asdf *</p>
**Rule 5:** The emphasis token cannot be empty.
Markdown
HTML
**
<p>**</p>
****
<hr />
### Links
Relative links and absolute links with the server’s host name will be treated as internal links, otherwise they will be
treated as external links and will automatically get `rel="nofollow"` and `target="_blank"` attributes.### Notes
Notes follow the [Markdown Extra’s notes syntax](https://michelf.ca/projects/php-markdown/extra#footnotes) but with
slightly different HTML output to match [Mecha](https://github.com/mecha-cms)’s common naming style. Multi-line notes
don’t have to be indented by four spaces as required by Markdown Extra. A space or tab is enough to continue the note.
Markdown
HTML
asdf [^1]
[^1]: asdf
<p>asdf <sup id="from:1"><a href="#to:1" role="doc-noteref">1</a></sup></p><div role="doc-endnotes"><hr /><ol><li id="to:1" role="doc-endnote"><p>asdf <a href="#from:1" role="doc-backlink">↩</a></p></li></ol></div>
asdf [^1]
[^1]:
asdf
====asdf
asdfasdf
asdf
asdfasdf
<p>asdf <sup id="from:1"><a href="#to:1" role="doc-noteref">1</a></sup></p><p>asdf</p><div role="doc-endnotes"><hr /><ol><li id="to:1" role="doc-endnote"><h1>asdf</h1><p>asdf asdf</p><pre><code>asdf</code></pre><p>asdf asdf <a href="#from:1" role="doc-backlink">↩</a></p></li></ol></div>
### Soft Break
Soft breaks are collapsed to spaces in non-critical parts such as in paragraphs and list items:
Markdown
HTML
asdf asdf asdf asdf
asdf asdf asdf asdfasdf asdf asdf asdf
<p>asdf asdf asdf asdf asdf asdf asdf asdf</p><p>asdf asdf asdf asdf</p>
### Code Block
I try to avoid conflict between different Markdown dialects and try to support whatever dialect you are using. For
example, since I originally used Markdown Extra, I am used to adding info string with a dot prefix to the fenced code
block syntax. This is not supported by Parsedown (or rather, Parsedown doesn’t care about the pattern of the given info
string and simply appends `language-` prefix to it, since CommonMark also doesn’t give implementors special rules for
processing info string in fenced code block syntax).Here’s how the code block results compare across each Markdown converter:
#### Markdown Extra
Markdown
HTML
~~~ asdf
asdf
~~~
<pre><code class="asdf">asdf
</code></pre>
~~~ .asdf
asdf
~~~
<pre><code class="asdf">asdf
</code></pre>
~~~ asdf asdf
asdf
~~~
Invalid.
~~~ .asdf.asdf
asdf
~~~
Invalid.
~~~ {#asdf.asdf}
asdf
~~~
<pre><code class="asdf" id="asdf">asdf
</code></pre>
~~~ {#asdf.asdf asdf=asdf}
asdf
~~~
Invalid.
#### Parsedown Extra
Markdown
HTML
~~~ asdf
asdf
~~~
<pre><code class="language-asdf">asdf</code></pre>
~~~ .asdf
asdf
~~~
<pre><code class="language-.asdf">asdf</code></pre>
~~~ asdf asdf
asdf
~~~
<pre><code class="language-asdf">asdf</code></pre>
~~~ .asdf.asdf
asdf
~~~
<pre><code class="language-.asdf.asdf">asdf</code></pre>
~~~ {#asdf.asdf}
asdf
~~~
<pre><code class="language-{#asdf.asdf}">asdf</code></pre>
~~~ {#asdf.asdf asdf=asdf}
asdf
~~~
<pre><code class="language-{#asdf.asdf">asdf</code></pre>
#### Mine
Markdown
HTML
~~~ asdf
asdf
~~~
<pre><code class="language-asdf">asdf</code></pre>
~~~ .asdf
asdf
~~~
<pre><code class="asdf">asdf</code></pre>
~~~ asdf asdf
asdf
~~~
<pre><code class="language-asdf">asdf</code></pre>
~~~ .asdf.asdf
asdf
~~~
<pre><code class="asdf">asdf</code></pre>
~~~ {#asdf.asdf}
asdf
~~~
<pre><code class="asdf" id="asdf">asdf</code></pre>
~~~ {#asdf.asdf asdf=asdf}
asdf
~~~
<pre><code asdf="asdf" class="asdf" id="asdf">asdf</code></pre>
### HTML Block
CommonMark doesn’t care about the DOM and therefore also doesn’t care if a HTML element is perfectly balanced or not.
Unlike the original Markdown syntax specification which doesn’t allow you to convert Markdown syntax inside a HTML
block, the CommonMark specification doesn’t limit such a case. It cares about blank lines around the lines that look
like a HTML block tag, as specified in [Section 4.6](https://spec.commonmark.org/0.31.2#html-blocks), type 6.Any text that comes after the opening and/or closing of a HTML block is treated as raw text and is not processed as
Markdown syntax. A blank line is required to end the raw HTML block state:
Markdown
HTML
<div> asdf asdf *asdf* asdf
</div> asdf asdf *asdf* asdf
<div> asdf asdf *asdf* asdf
</div> asdf asdf *asdf* asdf
<div>
asdf asdf *asdf* asdf</div>
asdf asdf *asdf* asdf
<div>
asdf asdf *asdf* asdf</div>
asdf asdf *asdf* asdf
<div>
asdf asdf *asdf* asdf
</div>
asdf asdf *asdf* asdf
<div><p>asdf asdf <em>asdf</em> asdf</p></div><p>asdf asdf <em>asdf</em> asdf</p>
Exception for types 1, 2, 3, 4, and 5. A line break is enough to end the raw HTML block state:
Markdown
HTML
<!-- asdf asdf *asdf* asdf --> asdf asdf *asdf* asdf
<!-- asdf asdf *asdf* asdf --> asdf asdf *asdf* asdf
<!-- asdf asdf *asdf* asdf -->
asdf asdf *asdf* asdf
<!-- asdf asdf *asdf* asdf --><p>asdf asdf <em>asdf</em> asdf</p>
<!-- asdf asdf *asdf* asdf -->
asdf asdf *asdf* asdf
<!-- asdf asdf *asdf* asdf --><p>asdf asdf <em>asdf</em> asdf</p>
The examples below will generate a predictable HTML code, but not because this converter cares about the existing HTML
tag balance:
Markdown
HTML
<nav>
<ul>
<li>
<a>asdf</a>
</li>
<li>
<a>asdf</a>
</li>
<li>
<a>asdf</a>
</li>
</ul>
</nav>asdf asdf *asdf* asdf
<nav>
<ul>
<li>
<a>asdf</a>
</li>
<li>
<a>asdf</a>
</li>
<li>
<a>asdf</a>
</li>
</ul>
</nav><p>asdf asdf <em>asdf</em> asdf</p>
<nav>
<ul>
<li>
<a>asdf</a>
</li>
<li>
<a>asdf</a>
</li>
<li>
<a>asdf</a>
</li>
</ul>
</nav>asdf asdf *asdf* asdf
<nav>
<ul>
<li>
<a>asdf</a>
</li>
<li>
<a>asdf</a>
</li>
<li>
<a>asdf</a>
</li>
</ul>
</nav><p>asdf asdf <em>asdf</em> asdf</p>
You will understand why when you add a number of blank lines at any point in the HTML block:
Markdown
HTML
<nav>
<ul>
<li>
<a>asdf</a>
</li><li>
<a>asdf</a>
</li>
<li>
<a>asdf</a>
</li>
</ul>
</nav>asdf asdf *asdf* asdf
<nav>
<ul>
<li>
<a><p>asdf</a></p></li><li>
<a>asdf</a>
</li>
<li>
<a>asdf</a>
</li>
</ul>
</nav><p>asdf asdf <em>asdf</em> asdf</p>
<nav>
<ul>
<li>
<a>asdf</a>
</li><li>
<a>asdf</a>
</li>
<li>
<a>asdf</a>
</li>
</ul>
</nav>asdf asdf *asdf* asdf
<nav>
<ul>
<li>
<a><pre><code> asdf</a>
</li><li>
<a>asdf</a>
</li>
<li>
<a>asdf</a>
</li></code></pre></ul>
</nav><p>asdf asdf <em>asdf</em> asdf</p>
Markdown Extra features the `markdown` attribute on HTML to allow you to convert Markdown syntax to HTML in a HTML
block. In this converter, the feature will not work. For now, I have no plans to add such feature to avoid DOM parsing
tasks as much as possible. This also ensured me to avoid on using [PHP `dom`](https://www.php.net/book.dom).However, if you add a blank line, it’s as if the feature works (although the `markdown` attribute is still there, it
doesn’t affect the HTML when rendered in the browser window). If you’re used to adding a blank line after the opening
HTML block tag and before the closing HTML block tag, you should be okay.
Markdown
HTML
<div markdown="1">
asdf asdf *asdf* asdf
</div>
<div markdown="1">
asdf asdf *asdf* asdf
</div>
<div markdown="1">
asdf asdf *asdf* asdf
</div>
<div markdown="1"><p>asdf asdf <em>asdf</em> asdf</p></div>
Opening an inline HTML element will not trigger the raw HTML block state unless the opening and closing tags stand alone
on a single line. This is explained in [Section 4.6](https://spec.commonmark.org/0.31.2#html-blocks), type 7:
Markdown
HTML
<span>asdf *asdf*</span> asdf *asdf* asdf
<p><span>asdf <em>asdf</em></span> asdf <em>asdf</em> asdf</p>
<span>
asdf *asdf*
</span>
asdf *asdf* asdf
<span>
asdf *asdf*
</span>
asdf *asdf* asdf
Since CommonMark doesn’t care about HTML structure, the examples below will also conform to the specification, even if
they result in broken HTML. However, these are very rarely intentionally written by hand, so such cases are very
unlikely to occur:
Markdown
HTML
<h1>
asdf asdf *asdf* asdf
</h1>
<h1><p>asdf asdf <em>asdf</em> asdf</p></h1>
<p>
asdf asdf *asdf* asdf
</p>
<p><p>asdf asdf <em>asdf</em> asdf</p></p>
### Image Block
Markdown was initiated before the HTML5 era. When the `` element was introduced, people started using it as a
feature to display an image with a caption. Most Markdown converters will convert image syntax that stands alone on a
single line as an image element wrapped in a paragraph element in the output. My converter would instead wrap it in a
figure element. Because for now, it seems like a figure element would be more desirable in this situation.Paragraphs that appear below it will be taken as the image caption if you prepend a number of spaces less than 4.
Markdown
HTML
![asdf](asdf.jpg)
<figure><img alt="asdf" src="asdf.jpg" /></figure>
![asdf](asdf.jpg)
asdf
<figure><img alt="asdf" src="asdf.jpg" /><figcaption>asdf</figcaption></figure>
![asdf](asdf.jpg)
asdfasdf
asdf
<figure><img alt="asdf" src="asdf.jpg" /><figcaption><p>asdf</p><p>asdf</p></figcaption></figure><p>asdf</p>
![asdf](asdf.jpg) asdf
<p><img alt="asdf" src="asdf.jpg" /> asdf</p>
FYI, this pattern should also be valid for average Markdown files. And so it will be gracefully degraded when parsed by
other Markdown converters.### List Block
List blocks follow the CommonMark specifications with one exception: if the next ordered list item uses a number that is
less than the number of the previous ordered list item, a new list block will be created. This is different from the
original specification, which does not care about the literal value of the number.
Markdown
HTML
1. asdf
2. asdf
3. asdf
<ol><li>asdf</li><li>asdf</li><li>asdf</li></ol>
1. asdf
1. asdf
1. asdf
<ol><li>asdf</li><li>asdf</li><li>asdf</li></ol>
1. asdf
2. asdf
1. asdf
<ol><li>asdf</li><li>asdf</li></ol><ol><li>asdf</li></ol>
### Table Block
Table blocks follow the [Markdown Extra’s table block syntax](https://michelf.ca/projects/php-markdown/extra#table).
However, there are a few additional features and rules:- The actual number of columns follows the number of columns in the table header separator. If you have columns in
table header and/or table data with a number that exceeds the actual number of columns, the excess columns will be
discarded. If you have columns in table header and/or table data with a number that is less than the actual number of
columns, several empty columns will be added automatically to the right side.
- Literal pipe characters in table columns must be escaped. Exceptions are those that appear in code span and attribute
values of raw HTML tags.
- Header-less table is supported, but may not be compatible with other Markdown converters. Consider using this feature
as rarely as possible, unless you have no plans to switch to other Markdown converters in the future.
- Table caption is supported and can be created using the same syntax as the image block’s caption syntax.
Markdown
HTML
asdf | asdf
---- | ----
asdf | asdf
<table><thead><tr><th>asdf</th><th>asdf</th></tr></thead><tbody><tr><td>asdf</td><td>asdf</td></tr></tbody></table>
asdf | asdf
---- | ----
<table><thead><tr><th>asdf</th><th>asdf</th></tr></thead></table>
---- | ----
asdf | asdf
<table><tbody><tr><td>asdf</td><td>asdf</td></tr></tbody></table>
XSS
---This converter is intended only to convert Markdown syntax to HTML based on the
[CommonMark](https://spec.commonmark.org/0.31.2) specification. It doesn’t care about your user input. I have no
intention of adding any special security features in the future, sorry. The attribute syntax feature may be a security
risk for you if you want to use this converter on your comment entries, for example:
Markdown
HTML
![asdf](asdf.asdf) {onerror="alert('Yo!')"}
<img alt="asdf" onerror="alert('Yo!')" src="asdf.asdf" />
There should be many specialized PHP applications already that have specific tasks to deal with XSS, so consider
post-processing the generated HTML markup before putting it out to the web:- [ezyang/htmlpurifier](https://github.com/ezyang/htmlpurifier)
- [voku/anti-xss](https://github.com/voku/anti-xss)Tests
-----Clone this repository into the root of your web server that supports PHP and then you can open the `test/from.php` and
`test/to.php` file with your browser to see the result and the performance of this converter in various cases.Tweaks
------Not all Markdown dialects are supported for various reasons. Some of the modification methods below can be implemented
to add features that you might find in other Markdown converters.Your Markdown content is represented as variable `$value`. If you modify the content before the function
`from_markdown()` is called, it means that you modify the Markdown content before it is converted. If you modify the
content after the function `from_markdown()` is called, it means that you modify the results of the Markdown conversion.### Globally Reusable Functions
To make `from_markdown()` and `to_markdown()` functions reusable globally, use this method:
~~~ php
'` with `'>'` directly from the results of the Markdown conversion:~~~ php
$value = from_markdown($value);$value = strtr($value, [' />' => '>']);
echo $value;
~~~### Strike
This method allows you to add strike-through syntax, as you may have already noticed in the
[GFM specification](https://github.github.com/gfm):~~~ php
$value = from_markdown($value);$value = preg_replace('/((?$2', $value);
echo $value;
~~~### Task List
I am against the task list feature because it promotes bad practices to abuse the form input element. Although from the
presentation side it displays a check box interface correctly, I still believe that input elements should ideally be
used inside a form element. There are several Unicode symbols that are more suitable and easier to read from the
Markdown source like ☐ and ☒, which means that this feature can actually be made using the existing list
feature:~~~ md
- ☒ asdf
- ☐ asdf
- ☐ asdf
~~~In case you need it, or don’t want to update your existing task list syntax in your Markdown files, here’s the hack:
~~~ php
$value = from_markdown($value);$value = strtr($value, [
'
[ ] ' => '
☐ ',
'
[x] ' => '
☒ ',
'
'
]);
echo $value;
~~~
### Pre-Defined Abbreviations, Notes, and References
By inserting abbreviations, notes, and references at the end of the Markdown content, it will be as if you had
pre-defined abbreviations, notes, and references feature. This should be placed at the end of the Markdown content,
because according to the [link reference definitions](https://spec.commonmark.org/0.31.2#example-204) specification, the
first declared reference always takes precedence:
~~~ php
$abbreviations = [
'CSS' => 'Cascading Style Sheet',
'HTML' => 'Hyper Text Markup Language',
'JS' => 'JavaScript'
];
$references = [
'mecha-cms' => ['https://github.com/mecha-cms', 'Mecha CMS', []],
'taufik-nurrohman' => ['https://github.com/taufik-nurrohman', 'Taufik Nurrohman', []],
];
$suffix = "";
if (!empty($abbreviations)) {
foreach ($abbreviations as $k => $v) {
$k = strtr($k, [
'[' => '\[',
']' => '\]'
]);
$v = trim(preg_replace('/\s+/', ' ', $v));
$suffix .= "\n*[" . $k . ']: ' . $v;
}
}
if (!empty($references)) {
foreach ($references as $k => $v) {
[$link, $title, $attributes] = $v;
$k = strtr($k, [
'[' => '\[',
']' => '\]'
]);
if ("" === $link || false !== strpos($link, ' ')) {
$link = '<' . $link . '>';
}
$reference = '[' . $k . ']: ' . $link;
if (!empty($title)) {
$reference .= " '" . strtr($title, ["'" => "\\'"]) . "'";
}
if (!empty($attributes)) {
foreach ($attributes as $kk => &$vv) {
// `{.asdf}`
if ('class' === $kk) {
$vv = '.' . trim(preg_replace('/\s+/', '.', $vv));
continue;
}
// `{#asdf}`
if ('id' === $kk) {
$vv = '#' . $vv;
continue;
}
// `{asdf}`
if (true === $vv) {
$vv = $kk;
continue;
}
// `{asdf=""}`
if ("" === $vv) {
$vv = $kk . '=""';
continue;
}
// `{asdf='asdf'}`
$vv = $kk . "='" . strtr($vv, ["'" => "\\'"]) . "'";
}
unset($vv);
sort($attributes);
$attributes = trim(strtr(implode(' ', $attributes), [
' #' => '#',
' .' => '.'
]));
$reference .= ' {' . $attributes . '}';
}
$suffix .= "\n" . $reference;
}
}
$value = from_markdown($value . "\n" . $suffix);
echo $value;
~~~
### Pre-Defined Header’s ID
Add an automatic `id` attribute to headers level 2 through 6 if it’s not set, and then prepend an anchor element that
points to it:
~~~ php
$value = from_markdown($value);
if ($value && false !== strpos($value, '"[^"]*"|\'[^\']*\'|[^>])*)?>([\s\S]+?)<\/\1>/', static function ($m) {
if (!empty($m[2]) && false !== strpos($m[2], 'id=') && preg_match('/\bid=("[^"]+"|\'[^\']+\'|[^\/>\s]+)/', $m[2], $n)) {
if ('"' === $n[1][0] && '"' === substr($n[1], -1)) {
$id = substr($n[1], 1, -1);
} else if ("'" === $n[1][0] && "'" === substr($n[1], -1)) {
$id = substr($n[1], 1, -1);
} else {
$id = $n[1];
}
$m[3] = '⚓ ' . $m[3];
return '<' . $m[1] . $m[2] . '>' . $m[3] . '' . $m[1] . '>';
}
$id = trim(preg_replace('/[^a-z\x{4e00}-\x{9fa5}\d]+/u', '-', strtolower($m[3])), '-');
$m[3] = '⚓ ' . $m[3];
return '<' . $m[1] . ($m[2] ?? "") . ' id="' . htmlspecialchars($id) . '">' . $m[3] . '' . $m[1] . '>';
}, $value);
}
echo $value;
~~~
### Idea: Embed Syntax
The [CommonMark specification for automatic links](https://spec.commonmark.org/0.31.2#autolinks) doesn’t limit specific
types of URL protocols. It just specifies the pattern so we can take advantage of the automatic link syntax to render it
as a kind of “embed” syntax, which you can then turn it into a chunk of HTML elements.
I’m sure this idea has never been done before and that’s why I want to be the first to mention it. But I’m not going to
integrate this feature directly into my converter to keep it slim. I just want to give you a couple of ideas.
Be aware that these tweaks are very naive, as they will directly convert the “embed” syntax without taking the block
type into account. You may need to use [this filter](https://github.com/taufik-nurrohman/markdown-filter) to replace the
“embed” syntax only in certain block types, e.g. to ignore the “embed” syntax inside a fenced code block syntax.
#### YouTube Video Embed
An embed syntax to display a YouTube video by video ID.
~~~ md
~~~
~~~ php
$value = preg_replace('/^[ ]{0,3}]+)>\s*$/m', '', $value);
$value = from_markdown($value);
echo $value;
~~~
#### GitHub Gist Embed
An embed syntax to display a GitHub gist by gist ID.
~~~ md
~~~
~~~ php
$value = preg_replace('/^[ ]{0,3}]+)>\s*$/m', '', $value);
$value = from_markdown($value);
echo $value;
~~~
#### Form Embed
An embed syntax to display a HTML form that was generated from the server side with a reference ID of `18a4596d42c` and
a `title` parameter to customize the HTML form title.
~~~ md
~~~
~~~ php
$value = preg_replace_callback('/^[ ]{0,3}?]+)([?][^#>]*)?([#][^>]*)?>\s*$/m', static function ($m) {
$path = $m[1];
$value = "";
parse_str(substr($m[2] ?? "", 1), $state);
$value .= '';
if (!empty($state['title'])) {
$value .= '
' . $state['title'] . '
';}
// … etc.
// Be careful not to include blank line(s), or the raw HTML block state will end before the HTML form is complete!
$value .= '';
return $value;
}, $value);
$value = from_markdown($value);
echo $value;
~~~
### Idea: Note Block
Several people have discussed this feature, and I think I like
[this answer](https://stackoverflow.com/a/41449789/1163000) the most. The syntax is compatible with native Markdown
syntax, which is nice to look at directly through the Markdown source, even when it gets rendered to HTML:
~~~ md
------------------------------
**NOTE:** asdf asdf asdf
------------------------------
~~~
~~~ md
------------------------------
**NOTE:**
asdf asdf asdf asdf
asdf asdf asdf asdf
asdf asdf asdf asdf
------------------------------
~~~
Most Markdown converters will render the syntax above to this HTML, which is still acceptable to be treated as a note
block from its presentation, despite its broken semantic:
~~~ html
NOTE: asdf asdf asdf
~~~
~~~ html
NOTE:
asdf asdf asdf asdf asdf asdf asdf asdf
asdf asdf asdf asdf
~~~
With regular expressions, you can improve its [semantic](https://w3c.github.io/aria#note):
~~~ php
$value = from_markdown($value);
$value = preg_replace_callback('/
(
NOTE:<\/strong>[\s\S]*?<\/p>)
/', static function ($m) {
return '
}, $value);
echo $value;
~~~
License
-------
This library is licensed under the [MIT License](LICENSE). Please consider
[donating 💰](https://github.com/sponsors/taufik-nurrohman) if you benefit financially from this library.
Links
-----
- Autumn image sample by [@blmiers2](https://www.flickr.com/photos/41304517@N00/6250498399)
- Emoticon image sample by [@emoticons4u](https://web.archive.org/web/20090117060451/http://emoticons4u.com) (web archive)