https://github.com/pgilbertschmitt/mtj-parser

Markdown parser that outputs to a simple (typed) object
https://github.com/pgilbertschmitt/mtj-parser

Last synced: about 2 months ago
JSON representation

Markdown parser that outputs to a simple (typed) object

Host: GitHub
URL: https://github.com/pgilbertschmitt/mtj-parser
Owner: PGilbertSchmitt
License: mit
Created: 2020-02-29T22:50:49.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2023-01-06T07:42:34.000Z (over 2 years ago)
Last Synced: 2025-03-11T08:18:51.329Z (2 months ago)
Language: TypeScript
Size: 1.16 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 12
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # MTJ-Parser

## The Markdown-to-JSON Parser

---

Parsers already exist aplenty for transforming Markdown into HTML. However, I don't want to use the HTML output of a parser I don't know and inject it unsafely into a ReactJS component. The problem is that the best parser I could find that _doesn't_ convert my input directly into HTML is [markdown-it](https://github.com/markdown-it/markdown-it), but the output is not as easy to convert into nested components as I would like. It's more akin to an array of token-like objects, with things like "open paragraph" and a matching "close paragraph", with all the paragraph tokens between them. The design is too linear for me, it would be better to have a tree. So, rather than reinvent the wheel, I've wrapped `markdown-it` with my own pseudo-parser, converting the serial structure into a nested structure that is better better for building components around. It's more like _unflattening_ (?) than _text parsing_, and since parsing text is a nightmare, I'll just piggyback off of [puzrin's](https://github.com/puzrin) hard work.

The output is something that I can more easily manipulate, which makes me happier. Also, I'm learning, and at the end of the day, isn't that all that matters?

The [rubric file](./rubric.md) shows all the currently supported Markdown formatting.

### ToDo

[ ] - 100% test coverage

### Install

`npm install mtj-parser`

### How-to

Simple use the `parseMarkdown` function, which should return a happy, nested markdown document:

JavaScript:

```JavaScript

import { parseMarkdown } from 'mtj-parser';

const mdString = `

# This is **Markdown**

`;

const mdObject = parseMarkdown(mdString);

// [

//   {

//     type: "heading",

//     parts: [

//       {

//         type: "text",

//         value: "This is "

//       },

//       {

//         type: "strong",

//         parts: [

//           {

//             type: "text",

//             value: "Markdown"

//           }

//         ]

//       },

//       {

//         type: "text",

//         value: ""

//       }

//     ],

//     size: 1

//   }

// ]

```

### Node types

Being written in TypeScript means that there are types available for the different node types, and can be imported from the index of the library along with the `parseMarkdown` function.

The nodes are split into two categories: `BaseNodes` and `SubNodes`. The root of the object is a `MarkdownDoc`, which is just an array of `BaseNodes`. `BaseNodes` are the top-level Markdown elements (paragraph, lists, etc). Most `BaseNodes` contain `SubNodes`, which are the text styling and control elements (bold, images, breaks, etc).

#### MarkdownDoc

Returned from `parseMarkdown(str)`

```TypeScript

type MarkdownDoc = BaseNode[];

```

#### BaseNode

Union type for all types of `BaseNodes`

```TypeScript

type BaseNode

  = Paragraph

  | Heading

  | HorizontalRow

  | Fence

  | OrderedList

  | BulletList

  | Blockquote

  | Table;

```

#### SubNode

Union type for all types of `SubNodes`

```TypeScript

type SubNode

  = Text

  | Link

  | Emphasis

  | Strong

  | Strikethrough

  | CodeInline

  | Image

  | HardBreak

  | SoftBreak;

```

#### BaseTypes and SubTypes

These two exports are enums whose values are assigned to the `type` members of the following nodes in order to quickly identify their type. The values of the enum members are strings, which is helpful for debugging.

#### Paragraph

```TypeScript

interface Paragraph {

  type: BaseTypes.paragraph;

  parts: SubNode[];

}

```

#### Heading

```TypeScript

interface Heading {

  type: BaseTypes.heading;

  parts: SubNode[];

  size: 1 | 2 | 3 | 4 | 5 | 6;

}

```

#### HorizontalRow

```TypeScript

interface HorizontalRow {

  type: BaseTypes.horizontalRow;

}

```

#### Fence

```TypeScript

interface Fence {

  type: BaseTypes.fence;

  value: string;

  lang: string;

}

```

#### ListItem

Not a node, but used by both Ordered and Bullet lists to contain the elements, which can be a `Paragraph` or either list type, which is how nested lists work.

```TypeScript

type ListItem = Array;

```

#### OrderedList

```TypeScript

interface OrderedList {

  type: BaseTypes.orderedList;

  list: ListItem[];

}

```

#### BulletList

```TypeScript

interface BulletList {

  type: BaseTypes.bulletList;

  list: ListItem[];

}

```

#### Blockquote

```TypeScript

interface Blockquote {

  type: BaseTypes.blockquote;

  parts: SubNode[];

}

```

#### Table

Tables make use of two non-node types, `Cell` and `Row`:

```TypeScript

interface Cell {

  parts: SubNode[];

  align: alignment;

}

interface Row {

  columns: Cell[];

}

interface Table {

  type: BaseTypes.table;

  head: Row;

  body: Row[];

}

```

#### Text

```TypeScript

interface Text {

  type: SubTypes.text;

  value: string;

}

```

#### Link

```TypeScript

interface Link {

  type: SubTypes.link;

  parts: SubNode[];

  dest: string;   // url, file reference, or anchor reference

  title?: string; // Hover text

}

```

#### Emphasis

```TypeScript

interface Emphasis {

  type: SubTypes.emphasis;

  parts: SubNode[];

}

```

#### Strong

```TypeScript

interface Strong {

  type: SubTypes.strong;

  parts: SubNode[];

}

```

#### Strikethrough

```TypeScript

interface Strikethrough {

  type: SubTypes.strikethrough;

  parts: SubNode[];

}

```

#### CodeInline

```TypeScript

interface CodeInline {

  type: SubTypes.codeInline;

  value: string;

}

```

#### Image

```TypeScript

interface Image {

  type: SubTypes.image;

  src: string;

  title?: string;

  alt?: string;

}

```

#### HardBreak

A hard break is made by following a line with 2 or more spaces and a newline

```TypeScript

interface HardBreak {

  type: SubTypes.hardbreak;

}

```

#### SoftBreak

A soft break is made by following a line with 0 or 1 space and a newline

```TypeScript

interface SoftBreak {

  type: SubTypes.softbreak;

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pgilbertschmitt/mtj-parser

Awesome Lists containing this project

README