Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/jxxcarlson/cayatex

Experimental markup language
https://github.com/jxxcarlson/cayatex

Last synced: about 1 month ago
JSON representation

Experimental markup language

Host: GitHub
URL: https://github.com/jxxcarlson/cayatex
Owner: jxxcarlson
Created: 2021-04-04T04:44:49.000Z (about 3 years ago)
Default Branch: element
Last Pushed: 2021-07-01T20:44:43.000Z (about 3 years ago)
Last Synced: 2024-04-14T03:13:27.777Z (3 months ago)
Language: Elm
Size: 13.3 MB
Stars: 4
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Lists

awesome-elm-pltd - CaYaTeX - CaYaTeX is an experimental markup language that compiles to LaTeX, Html, and (indirectly) PDF. (Languages / Markup Languages)

README

        # CaYaTeX

CaYaTeX is an experimental markup language that compiles 

to LaTeX, Html, and (indirectly) PDF. 

Here is a [demo](https://cayatex.lamdera.app), and here

is a sample document:

There are three constructs in the language: _text_, _elements_,

and _headings_. 

Elements have the form `[name |args| body]` 

where (a) `|args|` may be absent, (b) `|args|` is present, 

it has the form `|arg1, arg2, ...|` where the 

individual args are strings without commas, 

(c) the body is an Element. 

Headings consist of one or non-blank lines of text where the first

line begins with a sequence of one or more hash marks '#' followed

by a space and at least one alphabetic character. These function

as in Markdown.

## Example

[title Introduction to CaYaTeX]

1. [strong whiskey and other liquor]

2. [italic [strong stuff]]  

3. [math a^2 + b^2 = c^2]  

4. [image |width:500, height:300, align:center| https://pythagoras.io/proof.png]

5. [theorem |Pythagoras| For any right triangle, [math a^2 + b^2 = c^2], where

   [math a], [math b] are the legs, and [math c] is the hypotenuse.]

   

**Note.** See branch *master* for a variant of the language which contains *Blocks.*   

*Comments:*

1. The function `strong` takes a single argument, the body, which

is the string "whiskey and other liquor".

2. Elements can be nested.

3. Example three uses TeX/LaTeX-style mathematical notation.

4. In example four, the element takes to arguments in addition to its body.

5. This element maps to a LaTeX theorem environment.

One of the design goals is to have a clean, simple, and uniform syntax with as few

constructs as possible.  We aim to show that simple can be powerful, easy to learn,

and easy to use.

### Another examples

Here is a piece of text that parses to an inline expression:

	Pythagoras, a [italic [strong really, really] good] mathematician, showed

	that the sides of a right triangle satisfy the relation

	[math a^2 + b^2 = c^2].  For more information, see

	[link |Wikipedia| https://en.wikipedia.org/wiki/Pythagorean_theorem].

	For the idea of the proof, see

	[image |width:500, height:300, align:center| https://pythagoras.io/proof.png]

### Grammar

The grammar for elements is as follows.

type Element

= Text String (Maybe SourceMap)

| Element String (List String) Element (Maybe SourceMap)

| LX (List Element) (Maybe SourceMap)

	Element ->   

         Text String

         | "[" Identifier  Element "]"

         | "[" Identifier "|" Args "|"  Element "]"

	Args ->

         NonemptyString ("," NonemptyString)* 

### More Examples

Here is how one writes a list:

	[numbered-list 

    [item Raspberry jam]

    

    [item Sourdough bread]

    

    ]

A table is written like this:

   [table

   

   [row Hydrogen, 1, 1]

   

   [row Helium, 2, 4]

   

   [row Lithium, 3, 6]

   

   ]

One can have a header row and a row that

defines the format.

One also has CSV tables:

   [csv

   Hydrogen, 1, 1

   Helium, 2, 4

   Lithium, 3, 6

   ]

as well as verbatim text:

   [verbatim

   This is a test:

   indented 2

   

       indented 4

   ]

## Shorthand

For a few common constructs, there are also shorthand features

a la markdown. The idea is to make the composition of text more ergonomic.

Thus, one can also say

	[numbered-list

	- eggs

	- milk

	- bread

	]

For italic text, one can say _italic text_, and for bold text, one can say *bold text*.

For sections, etc., one says

   [section Introduction]

   [subsection Examples]

NOTE: What about saying

   [section |1| Introduction]

   [section |2| Examples]

where the first argument gives the _level_ of the section.

Pro: this makes it easier to shift levels programmatically.

Con: noiser syntax.

In shorthand, this becomes

	# Introduction

	## Exmaples

We intend the use of shorthand to be quite limited, perhaps just to the above examples.

**Question.** Should the shorthand be handled by preprocessor that maps the source text

to standard, no-shorthand CaYaTeX?  What impact would this have on performance in an

interactive editing scenario?

## Parser Architecture

### Types

The type of the AST is as below.

```elm

type Element

    = Text String (Maybe SourceMap)

    | Element String (List String) Element (Maybe SourceMap)

    | LX (List Element) (Maybe SourceMap)

```

The SourceMap locates the parsed expression in the source text.

We assume that the text is a list of strings that defines

a list of text blocks.  A text block is either

an ordinary paragraph (non-blank lines bordered by blank lines)

or an outer block.  Here is an example:

	0 This is a test.

	1 Ho ho ho!

	2

	3 {theorem|There are [strong many] primes!}

	4

	5 One, two, [italic three!]

In this example there are three blocks, with blockOffsets

0, 3, and 7.  The source map for [strong many] is

	blockOffset = 3

	offset = 19

	length = 13

To arrive at this conclusion, convert the list of

lines 3, 4, 5 to the string

	blockText = "{theorem|There are [strong many] primes!}"

and find the offset of the substring "[strong many]".

Consequently we have

	String.slice 19 (19 + 13) blockText == "[strong many]"

The generation is supplied to parseLoop as an argument

with the assumption that it is incremented for each

edit.

```elm

type alias SourceMap =

	{ blockOffset : Int

	, offset : Int

	, length : Int

	, generation : Int

	}

```

### Functions

((To be expanded))

Top level function: Parser.Driver.parseLoop.  This function repeatedly

calls Parser.Expression.parser until input is exhausted and a final

TextCursor is computed.  The role of the TextCursor will be explained

later.

## Questions and Issues

## On the question of one construct or two

Re the idea of using only one construct — e.g. inline elements — it

works.  All of the examples below are parsed into the current AST type.

If we adopt this idea, we should change the name of InLine to something

else.

Re the type of arguments, in the current setup, an Inline element

has a (function) name, a list of arguments, and a body.  The list

of arguments is a possibly empty list of strings.  The body

is a possibly empty list of InlineExpressions.  An InlineExpression

is either Text or an Inline.  Thus the body is parsed but the args

are not.

Note that this setup (see example below) can handle verbatim text

without having some special treatment by the parser.

There is a theoretical reason why we don't need two constructs: if we

can define functions, we can do anything.  The only downside that I see

is that LaTeX has text, macros, and environments, so the mapping 

from CaYaTeX to LaTeX will not be quite so straightforward.  We will

probably need a dictionary which tells us which function names map

to macros and which map to environments.  For rendering to Html, we 

won't need this.

## Prior Art

[Pollen](https://docs.racket-lang.org/pollen/)

[LaTeX](https://www.latex-project.org/) (of course)

[SwiftLaTeX](https://github.com/SwiftLaTeX/SwiftLaTeX)

[Tectonic](https://tectonic-typesetting.github.io/en-US/)