https://github.com/lotabout/mdbook-fix-cjk-spacing

mdbook preprocessor that removes extra space rendered for Chinese lines.
https://github.com/lotabout/mdbook-fix-cjk-spacing

Last synced: 22 days ago
JSON representation

mdbook preprocessor that removes extra space rendered for Chinese lines.

Host: GitHub
URL: https://github.com/lotabout/mdbook-fix-cjk-spacing
Owner: lotabout
License: mit
Created: 2020-07-20T04:37:58.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2020-07-20T09:28:33.000Z (almost 5 years ago)
Last Synced: 2025-03-26T05:41:58.604Z (about 1 month ago)
Language: Rust
Homepage:
Size: 29.3 KB
Stars: 4
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

awesome-starred - lotabout/mdbook-fix-cjk-spacing - mdbook preprocessor that removes extra space rendered for Chinese lines. (others)

README

[![Github Actions](https://github.com/lotabout/mdbook-fix-cjk-spacing/workflows/CI/badge.svg)](https://github.com/lotabout/mdbook-fix-cjk-spacing/actions?query=workflow%3ACI)
[![Crates.io](https://img.shields.io/crates/v/mdbook-fix-cjk-spacing.svg)](https://crates.io/crates/mdbook-fix-cjk-spacing)

[mdbook](https://github.com/rust-lang/mdBook) will render extra space of
continuous lines with CJK characters.

```
.....中文结尾
中文顶格...

will result in

.....中文结尾中文顶格...
`- note the space here
```

This preprocessor will fix that.

# Usage

1. Download the binary from the [release page](https://github.com/lotabout/mdbook-fix-cjk-spacing/releases) and put it in your `PATH`.
- Alternatively, build from source: `cargo install mdbook-fix-cjk-spacing`
2. Add the following config to your `book.toml`
```
[preprocessor.fix-cjk-spacing]
command = "mdbook-fix-cjk-spacing"
```
3. Done

# How does it work?

This preprocessor will work on AST of the markdown file:

1. It will use [pulldown-cmark](https://github.com/raphlinus/pulldown-cmark) to parse the markdown file.
2. When encounter a `SoftBreak` token, it will search before and after for a `Text` token.
3. The `SoftBreak` is omitted when the previous text ends with CJK and next text starts with CJK character.

The binary has a "raw" mode for showing the processed output:

```sh
cat markdown.md | md-fix-cjk-spacing raw
```

# The problem

In markdown, if we write several lines continuously, it will be parsed as a
whole block:

```
line 1
line 2
line 3

// will be parsed as

line 1
line 2
line 3

```

That means line breaks are kept and all the three lines are treated as a whole
paragraph.

However, the browser will convert the line break in a `

` into a single
space, so when we see the previous content in a browser, it will look like:

```
line 1 line 2 line 3
```

That is OK except when we use Chinese. There is no concept of space in
Chinese, so when we write:

```
中文第一行
中文接上行

// will show as

中文第一行中文接上行
// `- not the space here
```

It is really frustrating! So there are two major solutions:

1. Fixing the markdown parsing code to treat it correctly.
2. Write the whole paragraph in a long line.

The first option is actually not so practical. This 'bug' exist for so long
and still not fixed. The second will be so boring and un-friendly.

So here comes our solution with `mdbook`: Write a preprocessor to merge
Chinese lines automatically before parsing!

# The use case

Only the following situation are dealt with:

```
...[should contains no spaces]
[zero or more spaces|tab]

.....中文结尾
中文顶格...

// are modified to
.....中文结尾中文顶格...
// `- note no space here
```

Note that the content in code block will *not* be changed.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lotabout/mdbook-fix-cjk-spacing

Awesome Lists containing this project

README