Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gjtorikian/commonmarker

Ruby wrapper for the comrak (CommonMark parser) Rust crate
https://github.com/gjtorikian/commonmarker

commonmark html libcmark

Last synced: 1 day ago
JSON representation

Ruby wrapper for the comrak (CommonMark parser) Rust crate

Awesome Lists containing this project

README

        

# Commonmarker

Ruby wrapper for Rust's [comrak](https://github.com/kivikakk/comrak) crate.

It passes all of the CommonMark test suite, and is therefore spec-complete. It also includes extensions to the CommonMark spec as documented in the [GitHub Flavored Markdown spec](http://github.github.com/gfm/), such as support for tables, strikethroughs, and autolinking.

> [!NOTE]
> By default, several extensions not in any spec have been enabled, for the sake of end user convenience when generating HTML.
>
> For more information on the available options and extensions, see [the documentation below](#options-and-plugins).

## Installation

Add this line to your application's Gemfile:

gem 'commonmarker'

And then execute:

$ bundle

Or install it yourself as:

$ gem install commonmarker

## Usage

### Converting to HTML

Call `to_html` on a string to convert it to HTML:

```ruby
require 'commonmarker'
Commonmarker.to_html('"Hi *there*"', options: {
parse: { smart: true }
})
# =>

“Hi there

\n
```

(The second argument is optional--[see below](#options-and-plugins) for more information.)

### Generating a document

You can also parse a string to receive a `:document` node. You can then print that node to HTML, iterate over the children, and do other fun node stuff. For example:

```ruby
require 'commonmarker'

doc = Commonmarker.parse("*Hello* world", options: {
parse: { smart: true }
})
puts(doc.to_html) # =>

Hello world

\n

doc.walk do |node|
puts node.type # => [:document, :paragraph, :emph, :text, :text]
end
```

(The second argument is optional--[see below](#options-and-plugins) for more information.)

When it comes to modifying the document, you can perform the following operations:

- `insert_before`
- `insert_after`
- `prepend_child`
- `append_child`
- `delete`

You can also get the source position of a node by calling `source_position`:

```ruby
doc = Commonmarker.parse("*Hello* world")
puts doc.first_child.first_child.source_position
# => {:start_line=>1, :start_column=>1, :end_line=>1, :end_column=>7}
```

You can also modify the following attributes:

- `url`
- `title`
- `header_level`
- `list_type`
- `list_start`
- `list_tight`
- `fence_info`

#### Example: Walking the AST

You can use `walk` or `each` to iterate over nodes:

- `walk` will iterate on a node and recursively iterate on a node's children.
- `each` will iterate on a node's direct children, but no further.

```ruby
require 'commonmarker'

# parse some string
doc = Commonmarker.parse("# The site\n\n [GitHub](https://www.github.com)")

# Walk tree and print out URLs for links
doc.walk do |node|
if node.type == :link
printf("URL = %s\n", node.url)
end
end
# => URL = https://www.github.com

# Transform links to regular text
doc.walk do |node|
if node.type == :link
node.insert_before(node.first_child)
node.delete
end
end
# =>


The site

\n

GitHub

\n
```

#### Example: Converting a document back into raw CommonMark

You can use `to_commonmark` on a node to render it as raw text:

```ruby
require 'commonmarker'

# parse some string
doc = Commonmarker.parse("# The site\n\n [GitHub](https://www.github.com)")

# Transform links to regular text
doc.walk do |node|
if node.type == :link
node.insert_before(node.first_child)
node.delete
end
end

doc.to_commonmark
# => # The site\n\nGitHub\n
```

## Options and plugins

### Options

Commonmarker accepts the same parse, render, and extensions options that comrak does, as a hash dictionary with symbol keys:

```ruby
Commonmarker.to_html('"Hi *there*"', options:{
parse: { smart: true },
render: { hardbreaks: false}
})
```

Note that there is a distinction in comrak for "parse" options and "render" options, which are represented in the tables below. As well, if you wish to disable any-non boolean option, pass in `nil`.

### Parse options

| Name | Description | Default |
| --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
| `smart` | Punctuation (quotes, full-stops and hyphens) are converted into 'smart' punctuation. | `false` |
| `default_info_string` | The default info string for fenced code blocks. | `""` |
| `relaxed_tasklist_matching` | Enables relaxing of the tasklist extension matching, allowing any non-space to be used for the "checked" state instead of only `x` and `X`. | `false` |
| `relaxed_autolinks` | Enable relaxing of the autolink extension parsing, allowing links to be recognized when in brackets, as well as permitting any url scheme. | `false` |

### Render options

| Name | Description | Default |
| -------------------- | ------------------------------------------------------------------------------------------------------ | ------- |
| `hardbreaks` | [Soft line breaks](http://spec.commonmark.org/0.27/#soft-line-breaks) translate into hard line breaks. | `true` |
| `github_pre_lang` | GitHub-style `

` is used for fenced code blocks with info tags.                         | `true`  |

| `full_info_string` | Gives info string data after a space in a `data-meta` attribute on code blocks. | `false` |
| `width` | The wrap column when outputting CommonMark. | `80` |
| `unsafe` | Allow rendering of raw HTML and potentially dangerous links. | `false` |
| `escape` | Escape raw HTML instead of clobbering it. | `false` |
| `sourcepos` | Include source position attribute in HTML and XML output. | `false` |
| `escaped_char_spans` | Wrap escaped characters in span tags. | `true` |
| `ignore_setext` | Ignores setext-style headings. | `false` |
| `ignore_empty_links` | Ignores empty links, leaving the Markdown text in place. | `false` |
| `gfm_quirks` | Outputs HTML with GFM-style quirks; namely, not nesting `` inlines. | `false` |
| `prefer_fenced` | Always output fenced code blocks, even where an indented one could be used. | `false` |

As well, there are several extensions which you can toggle in the same manner:

```ruby
Commonmarker.to_html('"Hi *there*"', options: {
extension: { footnotes: true, description_lists: true },
render: { hardbreaks: false }
})
```

### Extension options

| Name | Description | Default |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------- | ------- |
| `strikethrough` | Enables the [strikethrough extension](https://github.github.com/gfm/#strikethrough-extension-) from the GFM spec. | `true` |
| `tagfilter` | Enables the [tagfilter extension](https://github.github.com/gfm/#disallowed-raw-html-extension-) from the GFM spec. | `true` |
| `table` | Enables the [table extension](https://github.github.com/gfm/#tables-extension-) from the GFM spec. | `true` |
| `autolink` | Enables the [autolink extension](https://github.github.com/gfm/#autolinks-extension-) from the GFM spec. | `true` |
| `tasklist` | Enables the [task list extension](https://github.github.com/gfm/#task-list-items-extension-) from the GFM spec. | `true` |
| `superscript` | Enables the superscript Comrak extension. | `false` |
| `header_ids` | Enables the header IDs Comrak extension. from the GFM spec. | `""` |
| `footnotes` | Enables the footnotes extension per `cmark-gfm`. | `false` |
| `description_lists` | Enables the description lists extension. | `false` |
| `front_matter_delimiter` | Enables the front matter extension. | `""` |
| `multiline_block_quotes` | Enables the multiline block quotes extension. | `false` |
| `math_dollars`, `math_code` | Enables the math extension. | `false` |
| `shortcodes` | Enables the shortcodes extension. | `true` |
| `wikilinks_title_before_pipe` | Enables the wikilinks extension, placing the title before the dividing pipe. | `false` |
| `wikilinks_title_after_pipe` | Enables the shortcodes extension, placing the title after the dividing pipe. | `false` |
| `underline` | Enables the underline extension. | `false` |
| `spoiler` | Enables the spoiler extension. | `false` |
| `greentext` | Enables the greentext extension. | `false` |

For more information on these options, see [the comrak documentation](https://github.com/kivikakk/comrak#usage).

### Plugins

In addition to the possibilities provided by generic CommonMark rendering, Commonmarker also supports plugins as a means of
providing further niceties.

#### Syntax Highlighter Plugin

The library comes with [a set of pre-existing themes](https://docs.rs/syntect/5.0.0/syntect/highlighting/struct.ThemeSet.html#implementations) for highlighting code:

- `"base16-ocean.dark"`
- `"base16-eighties.dark"`
- `"base16-mocha.dark"`
- `"base16-ocean.light"`
- `"InspiredGitHub"`
- `"Solarized (dark)"`
- `"Solarized (light)"`

````ruby
code = <<~CODE
```ruby
def hello
puts "hello"
end
```
CODE

# pass in a theme name from a pre-existing set
puts Commonmarker.to_html(code, plugins: { syntax_highlighter: { theme: "InspiredGitHub" } })

#



# def hello
#
puts "hello"
#
end
#

#

````

By default, the plugin uses the `"base16-ocean.dark"` theme to syntax highlight code.

To disable this plugin, set the value to `nil`:

````ruby
code = <<~CODE
```ruby
def hello
puts "hello"
end
```
CODE

Commonmarker.to_html(code, plugins: { syntax_highlighter: nil })

#

def hello

# puts "hello"
# end
#

````

To output CSS classes instead of `style` attributes, set the `theme` key to `""`:

````ruby
code = <<~CODE
```ruby
def hello
puts "hello"
end
CODE

Commonmarker.to_html(code, plugins: { syntax_highlighter: { theme: "" } })

#

def # hello

# puts "hello"
# end\n

````

To use a custom theme, you can provide a `path` to a directory containing `.tmtheme` files to load:

```ruby
Commonmarker.to_html(code, plugins: { syntax_highlighter: { theme: "Monokai", path: "./themes" } })
```

## Output formats

Commonmarker can currently only generate output in one format: HTML.

### HTML

```ruby
puts Commonmarker.to_html('*Hello* world!')

#

Hello world!


```

## Developing locally

After cloning the repo:

```
script/bootstrap
bundle exec rake compile
```

If there were no errors, you're done! Otherwise, make sure to follow the comrak dependency instructions.

## Benchmarks

```
❯ bundle exec rake benchmark
input size = 11064832 bytes

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
Warming up --------------------------------------
Markly.render_html 1.000 i/100ms
Markly::Node#to_html 1.000 i/100ms
Commonmarker.to_html 1.000 i/100ms
Commonmarker::Node.to_html
1.000 i/100ms
Kramdown::Document#to_html
1.000 i/100ms
Calculating -------------------------------------
Markly.render_html 15.606 (±25.6%) i/s - 71.000 in 5.047132s
Markly::Node#to_html 15.692 (±25.5%) i/s - 72.000 in 5.095810s
Commonmarker.to_html 4.482 (± 0.0%) i/s - 23.000 in 5.137680s
Commonmarker::Node.to_html
5.092 (±19.6%) i/s - 25.000 in 5.072220s
Kramdown::Document#to_html
0.379 (± 0.0%) i/s - 2.000 in 5.277770s

Comparison:
Markly::Node#to_html: 15.7 i/s
Markly.render_html: 15.6 i/s - same-ish: difference falls within error
Commonmarker::Node.to_html: 5.1 i/s - 3.08x slower
Commonmarker.to_html: 4.5 i/s - 3.50x slower
Kramdown::Document#to_html: 0.4 i/s - 41.40x slower
```