Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tokay-lang/tokay

Tokay is a programming language designed for ad-hoc parsing, inspired by awk.
https://github.com/tokay-lang/tokay

ad-hoc awk compiler language packrat parsing programming-language

Last synced: about 2 months ago
JSON representation

Tokay is a programming language designed for ad-hoc parsing, inspired by awk.

Awesome Lists containing this project

README

        


The Tokay Logo





Version badge for crates.io


Badge of docs.rs


Indicator for tokay.dev Website availability


MIT License badge

# Tokay

Tokay is a programming language designed for ad-hoc parsing.

> [!IMPORTANT]
> Tokay is under development and not considered for production use yet.
> Be part of Tokay's ongoing development, and [contribute](CONTRIBUTING.md)!
>

## About

Tokay is intended to become a programming language that can be used to quickly implement solutions for text processing problems. This can involve either simple data extractions, but also the parsing of syntactical structures or parts thereof and the conversion of information into structured parse trees or abstract syntax trees for further processing.

Therefore, Tokay is becoming not only a practical tool for simple one-liners like matchers or recognizers, but also a language that can be used to implement code analyzers, refactoring tools, interpreters, compilers or transpilers. Actually [Tokay's own language parser](src/compiler/tokay.tok) is implemented in Tokay itself.

Tokay is inspired by [awk](https://en.wikipedia.org/wiki/AWK), has syntactic and semantic flavours from [Python](https://www.python.org/) and [Rust](https://www.rust-lang.org/), but also follows its own philosophy, ideas and design principles. Thus, Tokay isn't directly compareable to other languages or projects, and stands on its own. It's an entirely new programming language.

Tokay is still a very young project and gains much potential. [Volunteers are welcome!](CONTRIBUTING.md)

## Highlights

- Interpreted, procedural and imperative scripting language
- Concise and easy to learn syntax and object system
- Stream-based input processing
- Automatic parse tree construction and synthesis
- Left-recursive parsing structures ("parselets") supported
- Implements a memoizing packrat parsing algorithm internally
- Robust and fast, as it is written entirely in safe [Rust](https://rust-lang.org)
- Enabling awk-style one-liners in combination with other tools
- Generic parselets and functions
- Import system to create modularized programs (*coming soon)
- Embedded interoperability with other programs (*coming soon)

## Installation

By using Rusts dependency manager and build-tool `cargo`, simply install Tokay with

```bash
$ cargo install tokay
```

For Arch Linux-based distros, there is also a [`tokay`](https://aur.archlinux.org/packages/tokay) and [`tokay-git`](https://aur.archlinux.org/packages/tokay-git) package in the [Arch Linux AUR](https://aur.archlinux.org/).

## Examples

Tokay's version of "Hello World" is quite obvious.

```tokay
print("Hello World")
```

> ```
> $ tokay 'print("Hello World")'
> Hello World
> ```

Tokay can also greet any wor(l)ds that are being fed to it. The next program prints "Hello Venus", "Hello Earth" or "Hello" followed by any other name previously parsed by the builtin `Word`-token. Any other input than a word is automatically omitted.

```tokay
print("Hello", Word)
```

> ```
> $ tokay 'print("Hello", Word)' -- "World 1337 Venus Mars 42 Max"
> Hello World
> Hello Venus
> Hello Mars
> Hello Max
> ```

A simple program for counting words and numbers and printing a total afterwards can be implemented like this:

```tokay
Word words += 1
Number numbers += 1
end print(words || 0, "words,", numbers || 0, "numbers")
```

> ```
> $ tokay 'Word words += 1; Number numbers += 1; end print(words || 0, "words,", numbers || 0, "numbers")' -- "this is just the 1st stage of 42.5 or .1 others"
> 9 words, 3 numbers
> ```

By design, Tokay constructs syntax trees from consumed information automatically.

The next program implements a parser and interpreter for simple mathematical expressions, like `1 + 2 + 3` or `7 * (8 + 2) / 5`. The result of each expression is printed afterwards.

Processing direct and indirect left-recursions without ending in infinite loops is one of Tokay's core features.

```tokay
_ : Char< \t>+ # redefine whitespace to just tab and space

Factor : @{
Int _ # built-in 64-bit signed integer token
'(' _ Expr ')' _
}

Term : @{
Term '*' _ Factor $1 * $4
Term '/' _ Factor $1 / $4
Factor
}

Expr : @{
Expr '+' _ Term $1 + $4
Expr '-' _ Term $1 - $4
Term
}

Expr _ print("= " + $1) # gives some neat result output
```

> ```
> $ tokay examples/expr_from_readme.tok
> 1 + 2 + 3
> = 6
> 7 * (8 + 2) / 5
> = 14
> 7*(3-9)
> = -42
> ...
> ```

Calculate the fibonacci numbers from parsed integers:

```tokay
fibonacci : @n {
if n <= 1 n else fibonacci(n - 1) + fibonacci(n - 2)
}

Int print($1, "=>", fibonacci($1))
```

> ```
> $ tokay examples/fibonacci2.tok
> 0
> 0 => 0
> 1
> 1 => 1
> 2
> 2 => 1
> 3
> 3 => 2
> 4
> 4 => 3
> 5
> 5 => 5
> 6
> 6 => 8
> 7
> 7 => 13
> 8
> 8 => 21
> 9
> 9 => 34
> 10
> 10 => 55
> ```

## Documentation

The Tokay homepage [tokay.dev](https://tokay.dev) provides links to a quick start and documentation. The documentation source code is maintained in a [separate repository](https://github.com/tokay-lang/tokay-docs).

## Debugging

For debugging, there are two methods to use.

### Tracing using the `log`-crate

For Rust standard trace, use the [`env_logger` facilities](https://docs.rs/env_logger/latest/env_logger/). Full trace is only compiled into debug executables, the release version only provides warning level and upwards.

```
$ RUST_LOG=tokay=debug tokay
```

Alternatively, tracing can be activated for the `__main__`-program by setting `TOKAY_LOG`. This is used to start tracing when the internal parser has been compiled and executed already, and parsed the actual program. `TOKAY_LOG` can be set to any `RUST_LOG`-compliant format, as it becomes `RUST_LOG` right after.

```
$ TOKAY_LOG=debug tokay
```

### Built-in AST and VM debugger using `TOKAY_DEBUG` and `TOKAY_PARSER_DEBUG`

Set `TOKAY_DEBUG` to a debug level between 1-6. This can also be achieved using `tokay -dddd` where every `d` increments the debug level. Additionally, `TOKAY_INSPECT` can be set to one or a list of parselet name (-prefixes) which should be inspected in VM step-by-step trace (`TOKAY_DEBUG=6`).

| Level | Mode |
| ----- | --------------------------------- |
| 0 | No debug |
| 1 | Print constructed AST |
| 2 | Print final intermediate program |
| 3 | Print compiled VM program |
| 4 | Print VM execution trace |
| 5 | Print VM stack contents |
| 6 | VM opcode debugger |

View the parsed AST of a program in debug-level 1:

> ```
> $ cargo run -q -- -d 'x = 42 print("Hello World " + x)'
> main [start 1:1, end 1:33]
> sequence [start 1:1, end 1:33]
> assign_drop [start 1:1, end 1:8]
> lvalue [start 1:1, end 1:3]
> identifier [start 1:1, end 1:2] => "x"
> value_integer [start 1:5, end 1:7] => 42
> call [start 1:8, end 1:33]
> identifier [start 1:8, end 1:13] => "print"
> callarg [start 1:14, end 1:32]
> op_binary_add [start 1:14, end 1:32]
> value_string [start 1:14, end 1:28] => "Hello World "
> identifier [start 1:31, end 1:32] => "x"
> ```

`TOKAY_PARSER_DEBUG` sets the specific debug level for the parser, which is implemented in Tokay itself and is part of the compiler. Only levels > 2 can be recognized here, as the AST of the parser is built into the code.

Here's the VM debugger in action running the simple "Hello World"-program:

> ```
> `$ TOKAY_INSPECT="__main__" cargo run -q -- 'print("Hello World")'`
> __main__ --- Code ---
> __main__ 000 Offset(Offset { offset: 6, row: 1, col: 7 })
> __main__ >001 LoadStatic(1)
> __main__ 002 Offset(Offset { offset: 0, row: 1, col: 1 })
> __main__ 003 CallStaticArg((2, 1))
> __main__ --- Reader ---
> __main__ offset=Offset { offset: 0, row: 1, col: 1 }
> __main__ eof=false
> __main__ --- Globals ---
> __main__ --- Stack ---
> __main__ --- Frames ---
> __main__ 000 capture: 0, reader: 0, fuse: None
>
> __main__ ip = 1 state = Ok(Push([59d29e639f88] "Hello World" (10)))
> __main__ --- Code ---
> __main__ 000 Offset(Offset { offset: 6, row: 1, col: 7 })
> __main__ 001 LoadStatic(1)
> __main__ 002 Offset(Offset { offset: 0, row: 1, col: 1 })
> __main__ >003 CallStaticArg((2, 1))
> __main__ --- Reader ---
> __main__ offset=Offset { offset: 0, row: 1, col: 1 }
> __main__ eof=false
> __main__ --- Globals ---
> __main__ --- Stack ---
> __main__ 000 [59d29e639f88] "Hello World" (10)
> __main__ --- Frames ---
> __main__ 000 capture: 0, reader: 0, fuse: None
>
> Hello World
> __main__ ip = 3 state = Ok(Push([59d29e498fd8] void (10)))
> __main__ exit state = Ok(Push([59d29e498fd8] void (10)))
> ```

## Logo

The Tokay programming language is named after the [Tokay gecko (Gekko gecko)](https://en.wikipedia.org/wiki/Tokay_gecko) from Asia, shouting out "token" in the night.

The Tokay logo and icon was thankfully designed by [Timmytiefkuehl](https://github.com/timmytiefkuehl).

Check out the [tokay-artwork](https://github.com/tokay-lang/tokay-artwork) repository for different versions of the logo as well.

## License

Copyright © 2024 by Jan Max Meyer, Phorward Software Technologies.

Tokay is free software under the MIT license.

Please see the LICENSE file for details.