Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
https://github.com/pragdave/earmark

Markdown parser for Elixir
https://github.com/pragdave/earmark
Last synced: about 2 months ago
JSON representation
Markdown parser for Elixir
Host: GitHub
URL: https://github.com/pragdave/earmark
Owner: pragdave
License: other
Created: 2014-07-05T02:53:38.000Z (almost 10 years ago)
Default Branch: master
Last Pushed: 2024-03-04T17:19:50.000Z (3 months ago)
Last Synced: 2024-04-25T17:05:21.998Z (about 2 months ago)
Language: Elixir
Size: 1.85 MB
Stars: 841
Watchers: 12
Forks: 133
Open Issues: 14
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists

awesome-elixir - earmark - Markdown parser for Elixir. (Markdown)
freaking_awesome_elixir - Elixir - Markdown parser for Elixir. (Markdown)
awesome - earmark - Markdown parser for Elixir (Elixir)
awesome-elixir - earmark - Markdown parser for Elixir. (Markdown)
fucking-awesome-elixir - earmark - Markdown parser for Elixir. (Markdown)
README

        

# Earmark—A Pure Elixir Markdown Processor

[![CI](https://github.com/pragdave/earmark/actions/workflows/elixir.yml/badge.svg)](https://github.com/pragdave/earmark/actions/workflows/ci.yml)

[![Coverage Status](https://coveralls.io/repos/github/pragdave/earmark/badge.svg?branch=master)](https://coveralls.io/github/pragdave/earmark?branch=master)

[![Hex.pm](https://img.shields.io/hexpm/v/earmark.svg)](https://hex.pm/packages/earmark)

[![Hex.pm](https://img.shields.io/hexpm/dw/earmark.svg)](https://hex.pm/packages/earmark)

[![Hex.pm](https://img.shields.io/hexpm/dt/earmark.svg)](https://hex.pm/packages/earmark)

**N.B.**

This README contains the docstrings and doctests from the code by means of [extractly](https://hex.pm/packages/extractly)

and the following code examples are therefore verified with `ExUnit` doctests.

## Table Of Content

- [Table Of Content](#table-of-content)

- [Options](#options)

  - [Earmark.Cli.Implementation](#earmarkcliimplementation)

  - [Earmark.Options](#earmarkoptions)

  - [Earmark.Options.make_options/1](#earmarkoptionsmake_options1)

  - [Earmark.Options.relative_filename/2](#earmarkoptionsrelative_filename2)

  - [Earmark.Options.with_postprocessor/2](#earmarkoptionswith_postprocessor2)

  - [Earmark.Internal](#earmarkinternal)

  - [Earmark.Internal.as_ast!/2](#earmarkinternalas_ast2)

  - [Earmark.Internal.from_file!/2](#earmarkinternalfrom_file2)

  - [Earmark.Internal.include/2](#earmarkinternalinclude2)

  - [Earmark.Transform](#earmarktransform)

    - [Structure Conserving Transformers](#structure-conserving-transformers)

    - [Postprocessors and Convenience Functions](#postprocessors-and-convenience-functions)

    - [Structure Modifying Transformers](#structure-modifying-transformers)

    - [Earmark.Restructure.walk_and_modify_ast/4](#earmarkrestructurewalk_and_modify_ast4)

    - [Earmark.Restructure.split_by_regex/3](#earmarkrestructuresplit_by_regex3)

- [Contributing](#contributing)

- [Author](#author)

## Options

### Earmark.Cli.Implementation

Functional (with the exception of reading input files with `Earmark.File`) interface to the CLI

returning the device and the string to be output.

### Earmark.Options

This is a superset of the options that need to be passed into `Earmark.Parser.as_ast/2`

The following options are proper to `Earmark` only and therefore explained in detail

- `compact_output`: boolean indicating to avoid indentation and minimize whitespace

- `eex`: Allows usage of an `EEx` template to be expanded to markdown before conversion

- `file`: Name of file passed in from the CLI

- `line`: 1 but might be set to an offset for better error messages in some integration cases

- `smartypants`: boolean use [Smarty Pants](https://daringfireball.net/projects/smartypants/) in the output

- `ignore_strings`, `postprocessor` and `registered_processors`: processors that modify the AST returned from

   Earmark.Parser.as_ast/`2` before rendering (`post` because preprocessing is done on the markdown, e.g. `eex`)

   Refer to the moduledoc of Earmark.`Transform` for details

All other options are passed onto Earmark.Parser.as_ast/`2`

### Earmark.Options.make_options/1

Make a legal and normalized Option struct from, maps or keyword lists

Without a param or an empty input we just get a new Option struct

iex(1)> { make_options(), make_options(%{}) }

{ {:ok, %Earmark.Options{}}, {:ok, %Earmark.Options{}} }

The same holds for the bang version of course

iex(2)> { make_options!(), make_options!(%{}) }

{ %Earmark.Options{}, %Earmark.Options{} }

We check for unallowed keys

iex(3)> make_options(no_such_option: true)

{:error, [{:warning, 0, "Unrecognized option no_such_option: true"}]}

Of course we do not let our users discover one error after another

iex(4)> make_options(no_such_option: true, gfm: false, still_not_an_option: 42)

{:error, [{:warning, 0, "Unrecognized option no_such_option: true"}, {:warning, 0, "Unrecognized option still_not_an_option: 42"}]}

And the bang version will raise an `Earmark.Error` as excepted (sic)

iex(5)> make_options!(no_such_option: true, gfm: false, still_not_an_option: 42)

** (Earmark.Error) [{:warning, 0, "Unrecognized option no_such_option: true"}, {:warning, 0, "Unrecognized option still_not_an_option: 42"}]

Some values need to be numeric

iex(6)> make_options(line: "42")

{:error, [{:error, 0, "line option must be numeric"}]}

iex(7)> make_options(%Earmark.Options{footnote_offset: "42"})

{:error, [{:error, 0, "footnote_offset option must be numeric"}]}

iex(8)> make_options(%{line: "42", footnote_offset: nil})

{:error, [{:error, 0, "footnote_offset option must be numeric"}, {:error, 0, "line option must be numeric"}]}

### Earmark.Options.relative_filename/2

Allows to compute the path of a relative file name (starting with `"./"`) from the file in options

and return an updated options struct

iex(9)> options = %Earmark.Options{file: "some/path/xxx.md"}

...(9)> options_ = relative_filename(options, "./local.md")

...(9)> options_.file

"some/path/local.md"

For your convenience you can just use a keyword list

iex(10)> options = relative_filename([file: "some/path/_.md", breaks: true], "./local.md")

...(10)> {options.file, options.breaks}

{"some/path/local.md", true}

If the filename is not absolute it just replaces the file in options

iex(11)> options = %Earmark.Options{file: "some/path/xxx.md"}

...(11)> options_ = relative_filename(options, "local.md")

...(11)> options_.file

"local.md"

And there is a special case when processing stdin, meaning that `file: nil` we replace file

verbatim in that case

iex(12)> options = %Earmark.Options{}

...(12)> options_ = relative_filename(options, "./local.md")

...(12)> options_.file

"./local.md"

### Earmark.Options.with_postprocessor/2

A convenience constructor

### Earmark.Internal

All public functions that are internal to Earmark, so that **only** external API

functions are public in `Earmark`

### Earmark.Internal.as_ast!/2

A wrapper to extract the AST from a call to `Earmark.Parser.as_ast` if a tuple `{:ok, result, []}` is returned,

raise errors otherwise

```elixir

    iex(1)> as_ast!(["Hello %% annotated"], annotations: "%%")

    [{"p", [], ["Hello "], %{annotation: "%% annotated"}}]

```

```elixir

    iex(2)> as_ast!("===")

    ** (Earmark.Error) [{:warning, 1, "Unexpected line ==="}]

```

### Earmark.Internal.from_file!/2

This is a convenience method to read a file or pass it to `EEx.eval_file` if its name

ends in  `.eex`

The returned string is then passed to `as_html` this is used in the escript now and allows

for a simple inclusion mechanism, as a matter of fact an `include` function is passed 

### Earmark.Internal.include/2

A utility function that will be passed as a partial capture to `EEx.eval_file` by

providing a value for the `options` parameter

```elixir

    EEx.eval(..., include: &include(&1, options))

```

thusly allowing

```eex

  <%= include.(some file) %>

```

where `some file`  can be a relative path starting with `"./"`

Here is an example using [these fixtures](https://github.com/pragdave/earmark/tree/master/test/fixtures)

```elixir

    iex(3)> include("./include/basic.md.eex", file: "test/fixtures/does_not_matter")

    "# Headline Level 1\n"

```

And here is how it is used inside a template

```elixir

    iex(4)> options = [file: "test/fixtures/does_not_matter"]

    ...(4)> EEx.eval_string(~s{<%= include.("./include/basic.md.eex") %>}, include: &include(&1, options))

    "# Headline Level 1\n"

```

### Earmark.Transform

#### Structure Conserving Transformers

For the convenience of processing the output of `Earmark.Parser.as_ast` we expose two structure conserving

mappers.

##### `map_ast`

Traverses an AST using a mapper function.

The mapper function will be called for each node including text elements unless `map_ast` is called with

the third positional parameter `ignore_strings`, which is optional and defaults to `false`, set to `true`.

Depending on the return value of the mapper function the traversal will either

- `{new_tag, new_atts, ignored, new_meta}`

  just replace the `tag`, `attribute` and `meta` values of the current node with the values of the returned

  quadruple (ignoring `ignored` for facilitating nodes w/o transformation)

  and then descend into the **original** content of the node.

- `{:replace, node}`

  replaces the current node with `node` and does not descend anymore, but continues traversal on sibblings.

- {new_function, {new_tag, new_atts, ignored, new_meta}}

  just replace the `tag`, `attribute` and `meta` values of the current node with the values of the returned

  quadruple (ignoring `ignored` for facilitating nodes w/o transformation)

  and then descend into the **original** content of the node but with the mapper function `new_function`

  used for transformation of the AST.

  **N.B.** The original mapper function will be used for transforming the sibbling nodes though.

takes a function that will be called for each node of the AST, where a leaf node is either a quadruple

like `{"code", [{"class", "inline"}], ["some code"], %{}}` or a text leaf like `"some code"`

The result of the function call must be

- for nodes → as described above

- for strings → strings or nodes

As an example let us transform an ast to have symbol keys

```elixir

      iex(1)> input = [

      ...(1)> {"h1", [], ["Hello"], %{title: true}},

      ...(1)> {"ul", [], [{"li", [], ["alpha"], %{}}, {"li", [], ["beta"], %{}}], %{}}]

      ...(1)> map_ast(input, fn {t, a, _, m} -> {String.to_atom(t), a, nil, m} end, true)

      [ {:h1, [], ["Hello"], %{title: true}},

        {:ul, [], [{:li, [], ["alpha"], %{}}, {:li, [], ["beta"], %{}}], %{}} ]

```

**N.B.** If this returning convention is not respected `map_ast` might not complain, but the resulting

transformation might not be suitable for `Earmark.Transform.transform` anymore. From this follows that

any function passed in as value of the `postprocessor:` option must obey to these conventions.

##### `map_ast_with`

this is like `map_ast` but like a reducer an accumulator can also be passed through.

For that reason the function is called with two arguments, the first element being the same value

as in `map_ast` and the second the accumulator. The return values need to be equally augmented

tuples.

A simple example, annotating traversal order in the meta map's `:count` key, as we are not

interested in text nodes we use the fourth parameter `ignore_strings` which defaults to `false`

```elixir

       iex(2)>  input = [

       ...(2)>  {"ul", [], [{"li", [], ["one"], %{}}, {"li", [], ["two"], %{}}], %{}},

       ...(2)>  {"p", [], ["hello"], %{}}]

       ...(2)>  counter = fn {t, a, _, m}, c -> {{t, a, nil, Map.put(m, :count, c)}, c+1} end

       ...(2)>  map_ast_with(input, 0, counter, true)

       {[ {"ul", [], [{"li", [], ["one"], %{count: 1}}, {"li", [], ["two"], %{count: 2}}], %{count: 0}},

         {"p", [], ["hello"], %{count: 3}}], 4}

```

Let us describe an implementation of a real world use case taken from [Elixir Forum](https://elixirforum.com/t/how-to-extend-earmark/47406)

Simplifying the exact parsing of the text node in this example we only want to replace a text node of the form `#elixir` with

a link to the Elixir home page _but_ only when inside a `{"p",....}` node

We can achieve this as follows

```elixir

      iex(3)> elixir_home = {"a", [{"href", "https://elixir-lang.org"}], ["Elixir"], %{}}

      ...(3)> transformer = fn {"p", atts, _, meta}, _ -> {{"p", atts, nil, meta}, true}

      ...(3)>                  "#elixir", true -> {elixir_home, false}

      ...(3)>                  text, _ when is_binary(text) -> {text, false}

      ...(3)>                  node, _ ->  {node, false} end

      ...(3)> ast = [

      ...(3)>  {"p", [],[ "#elixir"], %{}}, {"bold", [],[ "#elixir"], %{}},

      ...(3)>  {"ol", [], [{"li", [],[ "#elixir"], %{}}, {"p", [],[ "elixir"], %{}}, {"p", [], ["#elixir"], %{}}], %{}}

      ...(3)> ]

      ...(3)> map_ast_with(ast, false, transformer)

      {[

       {"p", [],[{"a", [{"href", "https://elixir-lang.org"}], ["Elixir"], %{}}], %{}}, {"bold", [],[ "#elixir"], %{}},

       {"ol", [], [{"li", [],[ "#elixir"], %{}}, {"p", [],[ "elixir"], %{}}, {"p", [], [{"a", [{"href", "https://elixir-lang.org"}], ["Elixir"], %{}}], %{}}], %{}}

      ], false}

```

An alternate, maybe more elegant solution would be to change the mapper function during AST traversal

as demonstrated [here](https://github.com/pragdave/earmark/blob/master/test/acceptance/transform/map_ast_with_fnchange_test.exs)

#### Postprocessors and Convenience Functions

These can be declared in the fields `postprocessor` and `registered_processors` in the `Options` struct,

`postprocessor` is prepened to `registered_processors` and they are all applied to non string nodes (that

is the quadtuples of the AST which are of the form `{tag, atts, content, meta}`

All postprocessors can just be functions on nodes or a `TagSpecificProcessors` struct which will group

function applications depending on tags, as a convienience tuples of the form `{tag, function}` will be

transformed into a `TagSpecificProcessors` struct.

```elixir

    iex(4)> add_class1 = &Earmark.AstTools.merge_atts_in_node(&1, class: "class1")

    ...(4)> m1 = Earmark.Options.make_options!(postprocessor: add_class1) |> make_postprocessor()

    ...(4)> m1.({"a", [], nil, nil})

    {"a", [{"class", "class1"}], nil, nil}

```

We can also use the `registered_processors` field:

```elixir

    iex(5)> add_class1 = &Earmark.AstTools.merge_atts_in_node(&1, class: "class1")

    ...(5)> m2 = Earmark.Options.make_options!(registered_processors: add_class1) |> make_postprocessor()

    ...(5)> m2.({"a", [], nil, nil})

    {"a", [{"class", "class1"}], nil, nil}

```

Knowing that values on the same attributes are added onto the front the following doctest demonstrates

the order in which the processors are executed

```elixir

    iex(6)> add_class1 = &Earmark.AstTools.merge_atts_in_node(&1, class: "class1")

    ...(6)> add_class2 = &Earmark.AstTools.merge_atts_in_node(&1, class: "class2")

    ...(6)> add_class3 = &Earmark.AstTools.merge_atts_in_node(&1, class: "class3")

    ...(6)> m = Earmark.Options.make_options!(postprocessor: add_class1, registered_processors: [add_class2, {"a", add_class3}])

    ...(6)> |> make_postprocessor()

    ...(6)> [{"a", [{"class", "link"}], nil, nil}, {"b", [], nil, nil}]

    ...(6)> |> Enum.map(m)

    [{"a", [{"class", "class3 class2 class1 link"}], nil, nil}, {"b", [{"class", "class2 class1"}], nil, nil}]

```

We can see that the tuple form has been transformed into a tag specific transformation **only** as a matter of fact, the explicit definition would be:

```elixir

    iex(7)> m = make_postprocessor(

    ...(7)>   %Earmark.Options{

    ...(7)>     registered_processors:

    ...(7)>       [Earmark.TagSpecificProcessors.new({"a", &Earmark.AstTools.merge_atts_in_node(&1, target: "_blank")})]})

    ...(7)> [{"a", [{"href", "url"}], nil, nil}, {"b", [], nil, nil}]

    ...(7)> |> Enum.map(m)

    [{"a", [{"href", "url"}, {"target", "_blank"}], nil, nil}, {"b", [], nil, nil}]

```

We can also define a tag specific transformer in one step, which might (or might not) solve potential performance issues

when running too many processors

```elixir

    iex(8)> add_class4 = &Earmark.AstTools.merge_atts_in_node(&1, class: "class4")

    ...(8)> add_class5 = &Earmark.AstTools.merge_atts_in_node(&1, class: "class5")

    ...(8)> add_class6 = &Earmark.AstTools.merge_atts_in_node(&1, class: "class6")

    ...(8)> tsp = Earmark.TagSpecificProcessors.new([{"a", add_class5}, {"b", add_class5}])

    ...(8)> m = Earmark.Options.make_options!(

    ...(8)>       postprocessor: add_class4,

    ...(8)>       registered_processors: [tsp, add_class6])

    ...(8)> |> make_postprocessor()

    ...(8)> [{"a", [], nil, nil}, {"c", [], nil, nil}, {"b", [], nil, nil}]

    ...(8)> |> Enum.map(m)

    [{"a", [{"class", "class6 class5 class4"}], nil, nil}, {"c", [{"class", "class6 class4"}], nil, nil}, {"b", [{"class", "class6 class5 class4"}], nil, nil}]

```

Of course the mechanics shown above is hidden if all we want is to trigger the postprocessor chain in `Earmark.as_html`, here goes a typical

example

```elixir

    iex(9)> add_target = fn node -> # This will only be applied to nodes as it will become a TagSpecificProcessors

    ...(9)>   if Regex.match?(~r{\.x\.com\z}, Earmark.AstTools.find_att_in_node(node, "href", "")), do:

    ...(9)>     Earmark.AstTools.merge_atts_in_node(node, target: "_blank"), else: node end

    ...(9)> options = [

    ...(9)> registered_processors: [{"a", add_target}, {"p", &Earmark.AstTools.merge_atts_in_node(&1, class: "example")}]]

    ...(9)> markdown = [

    ...(9)>   "http://hello.x.com",

    ...(9)>   "",

    ...(9)>   "[some](url)",

    ...(9)>  ]

    ...(9)> Earmark.as_html!(markdown, options)

    "
\nhttp://hello.x.com
\n\nsome\n"

```

##### Use case: Modification of Link Attributes depending on the URL

This would be done as follows

```elixir

        Earmark.as_html!(markdown, registered_processors: {"a", my_function_that_is_invoked_only_with_a_nodes})

```

##### Use case: Modification of the AST according to Annotations

**N.B.** Annotation are an _experimental_ feature in 1.4.16-pre and are documented [here](https://github.com/RobertDober/earmark_parser/#annotations)

By annotating our markdown source we can then influence the rendering. In this example we will just

add some decoration

```elixir

    iex(10)> markdown = [ "A joke %% smile", "", "Charming %% in_love" ]

    ...(10)> add_smiley = fn {_, _, _, meta} = quad, _acc ->

    ...(10)>                case Map.get(meta, :annotation) do

    ...(10)>                  "%% smile"   -> {quad, "\u1F601"}

    ...(10)>                  "%% in_love" -> {quad, "\u1F60d"}

    ...(10)>                  _            -> {quad, nil}

    ...(10)>                end

    ...(10)>                text, nil -> {text, nil}

    ...(10)>                text, ann -> {"#{text} #{ann}", nil}

    ...(10)>              end

    ...(10)> Earmark.as_ast!(markdown, annotations: "%%") |> Earmark.Transform.map_ast_with(nil, add_smiley) |> Earmark.transform

    "
\nA joke  ὠ1
\n\nCharming  ὠd\n"

```

#### Structure Modifying Transformers

For structure modifications a tree traversal is needed and no clear pattern of how to assist this task with

tools has emerged yet.

#### Earmark.Restructure.walk_and_modify_ast/4

Walks an AST and allows you to process it (storing details in acc) and/or

modify it as it is walked.

items is the AST you got from Earmark.Parser.as_ast()

acc is the initial value of an accumulator that is passed to both

process_item_fn and process_list_fn and accumulated. If your functions

do not need to use or store any state, you can pass nil.

The process_item_fn function is required. It takes two parameters, the

single item to process (which will either be a string or a 4-tuple) and

the accumulator, and returns a tuple {processed_item, updated_acc}.

Returning the empty list for processed_item will remove the item processed

the AST.

The process_list_fn function is optional and defaults to no modification of

items or accumulator. It takes two parameters, the list of items that

are the sub-items of a given element in the AST (or the top-level list of

items), and the accumulator, and returns a tuple

{processed_items_list, updated_acc}.

This function ends up returning {ast, acc}.

Here is an example using a custom format to make `` nodes and allowing

commented text to be left out


```elixir

    iex(1)> is_comment? = fn item -> is_binary(item) && Regex.match?(~r/\A\s*--/, item) end

    ...(1)> comment_remover =

    ...(1)>   fn items, acc -> {Enum.reject(items, is_comment?), acc} end

    ...(1)> italics_maker = fn

    ...(1)>   item, acc when is_binary(item) ->

    ...(1)>     new_item = Restructure.split_by_regex(

    ...(1)>       item,

    ...(1)>       ~r/\/([[:graph:]].*?[[:graph:]]|[[:graph:]])\//,

    ...(1)>       fn [_, content] ->

    ...(1)>         {"em", [], [content], %{}}

    ...(1)>       end

    ...(1)>     )

    ...(1)>     {new_item, acc}

    ...(1)>   item, "a" -> {item, nil}

    ...(1)>   {name, _, _, _}=item, _ -> {item, name}

    ...(1)> end

    ...(1)> markdown = """

    ...(1)> [no italics in links](http://example.io/some/path)

    ...(1)> but /here/

    ...(1)>

    ...(1)> -- ignore me

    ...(1)>

    ...(1)> text

    ...(1)> """

    ...(1)> {:ok, ast, []} = Earmark.Parser.as_ast(markdown)

    ...(1)> Restructure.walk_and_modify_ast(ast, nil, italics_maker, comment_remover)

    {[

      {"p", [],

        [

          {"a", [{"href", "http://example.io/some/path"}], ["no italics in links"],

          %{}},

          "\nbut ",

          {"em", [], ["here"], %{}},

          ""

        ], %{}},

        {"p", [], [], %{}},

        {"p", [], ["text"], %{}}

      ], "p"}

```

#### Earmark.Restructure.split_by_regex/3

Utility for creating a restructuring that parses text by splitting it into

parts "of interest" vs. "other parts" using a regular expression.

Returns a list of parts where the parts matching regex have been processed

by invoking map_captures_fn on each part, and a list of remaining parts,

preserving the order of parts from what it was in the plain text item.

```elixir

      iex(2)> input = "This is ::all caps::, right?"

      ...(2)> split_by_regex(input, ~r/::(.*?)::/, fn [_, inner|_] -> String.upcase(inner) end)

      ["This is ", "ALL CAPS", ", right?"]

```

## Contributing

Pull Requests are happily accepted.

Please be aware of one _caveat_ when correcting/improving `README.md`.

The `README.md` is generated by `Extractly` as mentioned above and therefore contributors shall not modify it directly, but

`README.md.eex` and the imported docs instead.

You need to run `mix xtra` after getting the dependencies to generate the `README.md` file.

Thank you all who have already helped with Earmark, your names are duly noted in [RELEASE.md](RELEASE.md).

## Author

Copyright © 2014,5,6,7,8,9, 2020,1,2 Dave Thomas, The Pragmatic Programmers & Robert Dober

@/+pragdave,  [email protected] & [email protected]

# LICENSE

Same as Elixir, which is Apache License v2.0. Please refer to [LICENSE](LICENSE) for details.