https://github.com/pgilbertschmitt/mtj-parser
Markdown parser that outputs to a simple (typed) object
https://github.com/pgilbertschmitt/mtj-parser
Last synced: about 2 months ago
JSON representation
Markdown parser that outputs to a simple (typed) object
- Host: GitHub
- URL: https://github.com/pgilbertschmitt/mtj-parser
- Owner: PGilbertSchmitt
- License: mit
- Created: 2020-02-29T22:50:49.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2023-01-06T07:42:34.000Z (over 2 years ago)
- Last Synced: 2025-03-11T08:18:51.329Z (2 months ago)
- Language: TypeScript
- Size: 1.16 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MTJ-Parser
## The Markdown-to-JSON Parser
---Parsers already exist aplenty for transforming Markdown into HTML. However, I don't want to use the HTML output of a parser I don't know and inject it unsafely into a ReactJS component. The problem is that the best parser I could find that _doesn't_ convert my input directly into HTML is [markdown-it](https://github.com/markdown-it/markdown-it), but the output is not as easy to convert into nested components as I would like. It's more akin to an array of token-like objects, with things like "open paragraph" and a matching "close paragraph", with all the paragraph tokens between them. The design is too linear for me, it would be better to have a tree. So, rather than reinvent the wheel, I've wrapped `markdown-it` with my own pseudo-parser, converting the serial structure into a nested structure that is better better for building components around. It's more like _unflattening_ (?) than _text parsing_, and since parsing text is a nightmare, I'll just piggyback off of [puzrin's](https://github.com/puzrin) hard work.
The output is something that I can more easily manipulate, which makes me happier. Also, I'm learning, and at the end of the day, isn't that all that matters?
The [rubric file](./rubric.md) shows all the currently supported Markdown formatting.
### ToDo
[ ] - 100% test coverage
### Install
`npm install mtj-parser`
### How-to
Simple use the `parseMarkdown` function, which should return a happy, nested markdown document:
JavaScript:
```JavaScript
import { parseMarkdown } from 'mtj-parser';const mdString = `
# This is **Markdown**
`;const mdObject = parseMarkdown(mdString);
// [
// {
// type: "heading",
// parts: [
// {
// type: "text",
// value: "This is "
// },
// {
// type: "strong",
// parts: [
// {
// type: "text",
// value: "Markdown"
// }
// ]
// },
// {
// type: "text",
// value: ""
// }
// ],
// size: 1
// }
// ]
```### Node types
Being written in TypeScript means that there are types available for the different node types, and can be imported from the index of the library along with the `parseMarkdown` function.
The nodes are split into two categories: `BaseNodes` and `SubNodes`. The root of the object is a `MarkdownDoc`, which is just an array of `BaseNodes`. `BaseNodes` are the top-level Markdown elements (paragraph, lists, etc). Most `BaseNodes` contain `SubNodes`, which are the text styling and control elements (bold, images, breaks, etc).
#### MarkdownDoc
Returned from `parseMarkdown(str)`
```TypeScript
type MarkdownDoc = BaseNode[];
```#### BaseNode
Union type for all types of `BaseNodes`
```TypeScript
type BaseNode
= Paragraph
| Heading
| HorizontalRow
| Fence
| OrderedList
| BulletList
| Blockquote
| Table;
```#### SubNode
Union type for all types of `SubNodes`
```TypeScript
type SubNode
= Text
| Link
| Emphasis
| Strong
| Strikethrough
| CodeInline
| Image
| HardBreak
| SoftBreak;
```#### BaseTypes and SubTypes
These two exports are enums whose values are assigned to the `type` members of the following nodes in order to quickly identify their type. The values of the enum members are strings, which is helpful for debugging.
#### Paragraph
```TypeScript
interface Paragraph {
type: BaseTypes.paragraph;
parts: SubNode[];
}
```#### Heading
```TypeScript
interface Heading {
type: BaseTypes.heading;
parts: SubNode[];
size: 1 | 2 | 3 | 4 | 5 | 6;
}
```#### HorizontalRow
```TypeScript
interface HorizontalRow {
type: BaseTypes.horizontalRow;
}
```#### Fence
```TypeScript
interface Fence {
type: BaseTypes.fence;
value: string;
lang: string;
}
```#### ListItem
Not a node, but used by both Ordered and Bullet lists to contain the elements, which can be a `Paragraph` or either list type, which is how nested lists work.```TypeScript
type ListItem = Array;
```#### OrderedList
```TypeScript
interface OrderedList {
type: BaseTypes.orderedList;
list: ListItem[];
}
```#### BulletList
```TypeScript
interface BulletList {
type: BaseTypes.bulletList;
list: ListItem[];
}
```#### Blockquote
```TypeScript
interface Blockquote {
type: BaseTypes.blockquote;
parts: SubNode[];
}
```#### Table
Tables make use of two non-node types, `Cell` and `Row`:
```TypeScript
interface Cell {
parts: SubNode[];
align: alignment;
}interface Row {
columns: Cell[];
}interface Table {
type: BaseTypes.table;
head: Row;
body: Row[];
}
```#### Text
```TypeScript
interface Text {
type: SubTypes.text;
value: string;
}
```#### Link
```TypeScript
interface Link {
type: SubTypes.link;
parts: SubNode[];
dest: string; // url, file reference, or anchor reference
title?: string; // Hover text
}
```#### Emphasis
```TypeScript
interface Emphasis {
type: SubTypes.emphasis;
parts: SubNode[];
}
```#### Strong
```TypeScript
interface Strong {
type: SubTypes.strong;
parts: SubNode[];
}
```#### Strikethrough
```TypeScript
interface Strikethrough {
type: SubTypes.strikethrough;
parts: SubNode[];
}
```#### CodeInline
```TypeScript
interface CodeInline {
type: SubTypes.codeInline;
value: string;
}
```#### Image
```TypeScript
interface Image {
type: SubTypes.image;
src: string;
title?: string;
alt?: string;
}
```#### HardBreak
A hard break is made by following a line with 2 or more spaces and a newline
```TypeScript
interface HardBreak {
type: SubTypes.hardbreak;
}
```#### SoftBreak
A soft break is made by following a line with 0 or 1 space and a newline
```TypeScript
interface SoftBreak {
type: SubTypes.softbreak;
}
```