Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/bashup/mdsh

Multi-lingual, Markdown-based Literate Programming... in run-anywhere bash
https://github.com/bashup/mdsh

bash literate-programming markdown scripting-language shell-scripting

Last synced: 27 days ago
JSON representation

Multi-lingual, Markdown-based Literate Programming... in run-anywhere bash

Awesome Lists containing this project

README

        

## Multi-Lingual Literate Programming with `mdsh`

`mdsh` is a bash script compiler and interpreter for markdown files. It can be used in a `#!` line to make markdown files executable, or it can be used as a standalone tool to generate dependency-free, distributable bash scripts from markdown files.

By default, `mdsh` only considers `shell` code blocks to be bash code, but you can also use `@mdsh` blocks to define handlers for other languages. For example, this script will run `python`-tagged code blocks by piping them to the `python` command:

~~~markdown
#!/usr/bin/env mdsh

# Hello World in Python

The following code block is executed at compile time (due to the `@mdsh`).
(The first word on the opening line could be `shell` or `sh` or anything
else, as long as the second word is `@mdsh`.)

```bash @mdsh
mdsh-lang-python() { python; }
```

Now that we've defined a language handler for `python`, this next code
block is translated to shell code that runs python with the block's
contents on stdin:

```python
print("hello world!")
```
~~~

Running the above markdown file produces the same results as this equivalent bash script:

~~~bash
#!/usr/bin/env bash
{ python; } <<'```'
print("hello world!")
```
~~~

`mdsh` supports processing blocks of any language that you can write a bash code snippet for, and even lets you write "compile-time" code to transform blocks containing metadata or DSL snippets into bash code. The results can be either executed on the fly for development, or deployed/distributed via `mdsh --compile`. Compiled scripts do not include any `@mdsh` code, nor do they have any hidden runtime dependencies: everything `mdsh --compile` outputs is code or data you gave it, or generated by bash code you gave it!

**Contents**

- [Installation](#installation)
- [Usage](#usage)
* [Data Blocks](#data-blocks)
* [Processing Non-`shell` Languages](#processing-non-shell-languages)
* [Advanced Block Compilation Techniques](#advanced-block-compilation-techniques)
+ [Compile-Time Variables](#compile-time-variables)
+ [Programmatic Block Generation](#programmatic-block-generation)
* [Command Blocks and Arguments](#command-blocks-and-arguments)
- [Tips and Techniques](#tips-and-techniques)
* [Literate Testing](#literate-testing)
* [Excluding Blocks From The Generated Script](#excluding-blocks-from-the-generated-script)
* [Making Executable (and Editable) Markdown Files](#making-executable-and-editable-markdown-files)
* [Making Sourceable Scripts (and handling $0)](#making-sourceable-scripts-and-handling-0)
* [Syntax Highlighting and Language Aliasing](#syntax-highlighting-and-language-aliasing)
- [Metaprogramming and Code Generation](#metaprogramming-and-code-generation)
* ["Static Linking" for Distribution](#static-linking-for-distribution)
- [Extending `mdsh` or Reusing its Functions](#extending-mdsh-or-reusing-its-functions)
* [Adding File Headers or Footers](#adding-file-headers-or-footers)
* [Altering Existing Functions](#altering-existing-functions)
* [Available Functions](#available-functions)

## Installation

`mdsh` can be installed in any of the following ways:

* Using [basher](https://github.com/basherpm/basher), via `basher install bashup/mdsh`
* Using [composer](https://getcomposer.org/), via `composer require bashup/mdsh:dev-master` (to add it to your project) or `composer global require bashup/mdsh:dev-master` (to install it globally)
* Using git, by cloning this repo and copying or linking the `bin/mdsh` file to a directory on your `PATH`, or
* Just [downloading the script directly](https://github.com/bashup/mdsh/raw/master/bin/mdsh) to a directory on your `PATH`, then running `chmod +x` on it)

## Usage

Running `mdsh` *markdownfile args...* will read and translate unindented, triple-backquote fenced code blocks from *markdownfile* into bash code, based on the language listed on the block and any translation rules you've defined. The resulting translated script is then run, passing in *args* as positional arguments to the script.

Blocks tagged as `shell` are interpreted as bash code, and directly copied to the translated script. So arguments passed to `mdsh` after the path to the markdown file are available as `$1`, `$2`, etc. within the top-level code of `shell` blocks, just like in a regular bash script.

(Typically, you won't run `mdsh` directly, but will put `#!/usr/bin/env mdsh` on the first line of your markdown file instead, and make it executable with `chmod +x`. That way, users of your script won't need to do anything special to run it.)

You can also use `mdsh --compile` *file1 file2...* to translate one or more markdown files to bash code, sending the result to stdout. (A filename of `-` means "read from standard input".) This can be useful for debugging, or to make a distributable version of your script that does not require its users to have `mdsh`.

(There is also an `mdsh --eval` *filename* option, which is similar to `--compile`, but only takes one, non-stdin file, and emits special code at the end to support markdown files being sourced; see the section below on [Making Sourceable Scripts](#making-sourceable-scripts-and-handling-0) for more details.)

Both `--eval` and `--compile` can be preceded with `--out` *filename*, in which case *filename*'s contents will be replaced with `mdsh`'s output, if and only if the compilation or run succeeds without any errors. (The output is buffered in-memory, then output all at once upon successful completion. If the file already existed, its permissions will remain unchanged.)

### Data Blocks

The contents of unindented, triple-backquoted blocks that are *not* tagged `shell` or `shell @mdsh` are treated as *data* by default: their contents are added to bash arrays named according to the language on the block, e.g.:

~~~markdown
# Data Arrays Example
Blocks without a defined language processor get translated to a variable
assignment like `mdsh_raw_json+=(\#\ block\ 0)` at that point in the
generated script:

```json
{ "hello": "world" }
```
```shell
echo "${mdsh_raw_json[0]}" # prints '{ "hello": "world" }'
```
```json
{ "this is": "great" }
```
```shell
echo "${mdsh_raw_json[0]}" # prints '{ "hello": "world" }'
echo "${mdsh_raw_json[1]}" # prints '{ "this is": "great" }'
```

## Naming Rules
Language names are *case sensitive*, and non-identifier
characters in language names become `_` in variable names:

```C++
// hey
```
```shell
echo "${mdsh_raw_C__[0]}" # prints '// hey'
```
~~~

Of course, it would be even better if you could automate the processing of these blocks, so you don't have to follow every block with another `shell` block to process it! Which is why the next section is on...

### Processing Non-`shell` Languages

To automate the handling of non-`shell` language blocks, you can define one or more `@mdsh` blocks, containing "hook functions". `@mdsh` blocks are a bit like a Makefile, in that they define *rules* for how to *build* parts of your script, based on the language used.

These build rules are specified by defining specially-named bash functions. Unlike functions in `shell` blocks, these functions are *not* part of your script and therefore can't be called directly. Instead, `mdsh` itself invokes them (or copies out their source code), whenever a subsequent block's language matches one of the functions' names.

The language of a markdown code block is normally just one word after its opening backquotes. But if more than one word appears, then mdsh considers the language to be either the second word (if it begins with `@`), or the entire line flattened into a single variable name. Some example translations:

| Block Opening | Effective Language | Function Names |
| ------------------------- | ------------------ | -----------------------------------------------------------: |
| `` ```C++ `` | `C++` | `mdsh-lang-C++`
`mdsh-compile-C++` |
| `` ```C++ example`` | `C___example` | `mdsh-lang-C___example`
`mdsh-compile-C___example` |
| `` ```foo bar.baz spam`` | `foo_bar_baz_spam` | `mdsh-lang-foo_bar_baz_spam`
`mdsh-compile-foo_bar_baz_spam` |
| `` ```foo @bar.baz spam`` | `bar.baz` | `mdsh-lang-bar.baz`
`mdsh-compile-bar.baz` |
| `` ```shell script `` | `shell_script` | `mdsh-lang-shell_script`
`mdsh-compile-shell_script` |
| `` ```shell @mdsh`` | `mdsh` | `mdsh-lang-mdsh`
`mdsh-compile-mdsh` |

Function names are interpreted as follows:

* An `mdsh-lang-X` function is a template for code to be run when a block of language `X` is encountered. Its function body is copied to the translated script as a bash compound statment (i.e. in curly braces`{...}`) , that will execute with the block contents as on its standard input. (Its standard output is the same as the overall script's.)
* An `mdsh-compile-X` function is invoked *at compile time* with the block contents as `$1`, and must output a bash source code translation of the block on its stdout. The block's full original language tag is in `$2`, and the code block's starting line number is in `$3`.
* If neither an `mdsh-lang-X` nor `mdsh-compile-X` function exists, `mdsh-misc` is invoked *at compile time* with the raw language tag as `$1` and the block contents as `$2`. The output of `mdsh-misc` will be added to the compiled script. (The default implementation of `mdsh-misc` outputs code to save the block contents in a variable, as described above in the [Data Blocks](#data-blocks) section, above.)
* An `mdsh-after-X` function is a template for code to be run *after* a block of language `X` is encountered. Its function body is copied to the translated script as a block just after the `mdsh-lang-X` body, `mdsh-compile-X` output, or `mdsh_raw_X+=(contents)` statement. It does *not* receive the block source, so its standard input and output are those of the script itself.

If both an `mdsh-lang-X` and `mdsh-compile-X` function exist, `mdsh-lang-X` takes precedence. Defining either one also disables the `$mdsh_raw_X` functionality: only untranslatable "data" blocks are added to the arrays.

If there is no `mdsh-lang-X` or `mdsh-compile-X` however, the `mdsh-after-X` function can read the most recent block's contents from `${mdsh_raw_VARNAME[-1]}` (unless you've replaced the default `mdsh-misc` implementation). If you don't unset the array, it will keep growing as more blocks of that language are encountered.

Note: these function names are **case sensitive**, so a block tagged with an uppercase `C` will not trigger the same functions as a block tagged with a lowercase `c`, or vice-versa. Also, note that because `mdsh` blocks are executed at compile time, they do **not** have access to the script's arguments or I/O: all you can do in them is define hook functions.

Finally, please remember that you usually shouldn't put any code in an `@mdsh` block aside from hook functions, unless you're intentionally doing [metaprogramming or code generation](#metaprogramming-and-code-generation). That's because `@mdsh` blocks are *not* part of the translated script, they are part of the *translation process*. So any functions you define in them won't be around when the script actually runs, and any changes you make to variables won't be still around when the actual script execution happens.

### Advanced Block Compilation Techniques

Once you've gotten used to doing some `mdsh-lang-X` functions, why not try your hand at some `mdsh-compile` ones?

For example, in the [`jqmd` project](https://github.com/bashup/jqmd), I originally had some code that looked like this:

```bash
YAML() { JSON "$(echo "$1" | yaml2json -)"; }

mdsh-lang-yaml() { YAML "$(cat)"; }
```

Which works pretty well, except, since the YAML is a constant value, why not convert it to JSON during compilation? That way, we could eliminate the runtime overhead (if we save and rerun the compiled script):

```bash
mdsh-compile-yaml() { printf 'JSON %q\n' "$(echo "$1" | yaml2json)"; }
```

Notice the difference between the two functions: the `lang` function is a code *template*. `mdsh` copies its body into your script source, resulting in code that looks like:

```bash
{
YAML "$(cat)"
} <<'```'
... yaml data here ...
​```
```

But the `compile` function simply runs `yaml2json` immediately, and then writes out the translated data, like so:

```bash
JSON ...shell-quoted json here...
```

Notice the use of `printf` with `%q` -- this results in the data being properly escaped to work as a command line argument. (Take care when you do direct code generation to escape such values properly. When you need to insert variable data into generated code, always use `printf` with a constant string format, with `%q` placeholders for any standalone arguments.)

Notice too, by the way, that `compile` functions get access to the actual block text, which means that you can do any sort of code generation you like. For example, I could have taken the output of `yaml2json`, and run `jq` over it, then looped over the output and written bash code to set variables based on the result, or generated code for subcommands based on the specification, or maybe even generated an argument parser from it. There are all sorts of interesting possibilities for these kinds of code generation techniques!

#### Compile-Time Variables

In addition to their positional arguments, compile-time hooks such as `mdsh-misc` and `mdsh-compile-X` also receive a few variables that can be helpful for parsing special block headers or generating error messages:

* `${tag_words[@]}` is an array of the whitespace-separated words from the original block opening line. For example, if a block opened with `` ```foo @bar.baz spam ``, then `tag_words=([0]="foo" [1]="@bar.baz" [2]="spam")`. (`${#tag_words[@]}` is the number of words.)
* `$mdsh_lang` is the language of the block as viewed by mdsh -- i.e., the `X` in `mdsh-lang-X`. (So it's either `${tag_words[0]}`, `${tag_words[1]#@}`, or the entire line contents with non-identifier characters replaced by `_`.)
* If the source being compiled is a file, `$MDSH_SOURCE` is the source filename.
* `$block_start` is the starting line number of the block in the original source.
* `$mdsh_block` contains the text of the block
* `$mdsh_tag` contains the original block opening line (i.e., the unsplit form of tag_words)

(These variables are also usable by compile-time command blocks, as described in the next section.)

#### Programmatic Block Generation

The `mdsh-block` function allows you to programmatically generate a code block of a designated language. This can be useful for e.g. conditional blocks. For example, this `if-env` function can be used in a command block to generate code that will check the value of `$WP_ENV` at runtime and conditionally execute the block's contents:

~~~markdown
```shell @mdsh
if-env() {
printf -v REPLY '|%q' "$@"
echo "case \$WP_ENV in ${REPLY#|})"
mdsh-block "$mdsh_lang" "$mdsh_block" "$block_start"
echo
echo "esac"
}
```

```css !if-env dev staging
/* This CSS is only used in dev and staging */
```
~~~

The `mdsh-block` function takes up to four arguments: a language, a block body, a starting line number, and a "raw" language tag (which defaults to the language if not given). The first three arguments are also optional, defaulting to `$mdsh_lang`, `$mdsh_block`, and `$block_start` if omitted. (Which means the above code could have just called `mdsh-block` with no arguments!)

`mdsh-block` follows the standard language lookup logic, looking first for `mdsh-lang-X`, then `mdsh-compile-X`, and then falling back to `mdsh-misc`, cloning `mdsh-after-X` as well if applicable. It does not support command blocks or language aliases, so no `@` ,`+`, `!`, or `|` expressions can be used. It's intended for use in compile-time code only, i.e. `!` command blocks, `@mdsh` blocks, and handlers like `mdsh-misc` and `mdsh-compile-X` functions.

### Command Blocks and Arguments

Sometimes you have only one block that needs to be processed in a particular way, or each block of a particular language needs unique arguments to compile or execute. For these scenarios, you can define "command blocks".

A command block is a code block whose language tag's *second word* begins with a `|`, `+`, or `!`:

* If it's a `|`, the remainder of the language tag is executed at runtime with the block's contents on standard input (just like an `mdsh-lang-X` function body), and the shell variable `mdsh_lang` set to the first word of the language tag.
* If it's a `+`, the remainder of the language tag is executed at runtime with the block's contents as an extra command line argument, and the shell variable `mdsh_lang` set to the first word of the language tag.
* If it's a `!`, the remainder of the language tag is executed at **compile** time with the block's contents in `$1`, and must output compiled code to standard output (just like an `mdsh-compile-X` function). The full language tag is in `$2`, and the code block's starting line number is in `$3`. All of the standard [compile-time variables](#compile-time-variables) are available, including `mdsh_lang`, `tag_words`, `block_start`, and possibly `MDSH_SOURCE`.

In all of the above cases, `$mdsh_lang` is set to the *first* word of the language tag, but is not otherwise included in the command line executed. (It's assumed to be a syntax highlighting hint, but can also be used as a parameter if your code references `$mdsh_lang`.)

Command blocks override normal language function lookups, so no `mdsh-after-X` , `mdsh-lang-X`, or `mdsh-compile-X` functions are looked up or executed for command blocks. Thus, this code as input to mdsh:

~~~markdown
```json !printf "echo %q\n" "# line $3, $mdsh_lang block:" "def example: $1;"
{"foo": "bar"}
```

```html +echo "The $mdsh_lang is:"