Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jdonnerstag/vlang-rosie

Native V-lang implementation of Rosie-RPL. Rosie is a pattern language (RPL for short). A little bit like regex, but aiming to solve several of the regex issues.
https://github.com/jdonnerstag/vlang-rosie

cli regex rosie rpl vlang vlang-module

Last synced: 3 months ago
JSON representation

Native V-lang implementation of Rosie-RPL. Rosie is a pattern language (RPL for short). A little bit like regex, but aiming to solve several of the regex issues.

Awesome Lists containing this project

README

        

# Native [V-lang](https://vlang.io) implementation of [Rosie-RPL](https://rosie-lang.org/)

[Rosie](https://rosie-lang.org/) is a pattern language (RPL for short). A little bit like
regex, but aiming to solve several of the [regex issues](https://jamiejennings.com/posts/2021-09-23-dont-look-back-2/).
All credits to Jamie A. Jennings and her friends for this job.

This project (native V-lang implementation of RPL) is work in progress (beta), but ready to be tested
in the field. APIs may still change, CLI is available, and a REPL is on the todo list.
The current version is fully functional: it parses and compiles all files in Jamie's RPL libary (./rpl), including
the rpl files to parse RPL code, and it successfully executes all (inline) unittests in this folder.

Very similar to a compiler, the project consists of the following modules:
- A core 0 parser, written in V, which is able to parse rpl input into an AST
- A RPL-parser which uses the core-0 parser to create the byte-code for the RPL-parser
- An Expander (and optimizer) that expands macros and aliases
- A compiler backend, which converts the AST into virtual machine byte code
- A virtual machine runtime, able to execute the byte code instructions and match input against the pattern
- A CLI module (command line interface)
- A unittest module, which support RPL inline tests to valid patterns in '*.rpl' files
- A disassembler that prints the byte code instructions generated for a specific pattern
- A tracer utility (via CLI) that greatly helps with debugging input against pattern

Not yet available:
- Possibly an additional compiler backend that generates native V code
- A shared library and language integration, e.g. Python

## Project objectives

- Be compliant with the RPL Language Reference
- Easy and intuitive to use in V-lang projects
- A REPL to test and debug rpl pattern easily
- Jamie's implementation has nice support for grep-like search, colored output, and also tree-like output
to review details of the AST. I'd like to reach at least a similar level of user support.
- Integration with other lanugages such as Python, Julia, C/C++, Rust, Java, JavaScript, etc.
The more popular languages are supported, the better. (I wish V-lang would have a python integration module)
- A Visual-Code Studio plugin would be nice. Syntax highligting for rpl files, readonly view of
disassembled rplx files, compile rpl files upon save or manually triggered, automatically run
unittest, etc..

## A bit of history

The project started with a tiny virtual machine (v1), able to load and execute '\*.rplx' files
(compiled RPL code), generated by Rosie's original compiler. It is working, but is not battle tested.
By now, the virtual machine has evolved (v2) and is no longer backwards compatible. We are
still able to read and execute '\*.rplx' files, but we'll not put more effort into it.

Please note that neither the '\*.rplx' file structure nor the byte codes of the virtual
machine are part of Rosie's specification and thus are subject to change without
formal notice from the Rosie team.

Originally the project was a proof-of-concept aiming at getting pratical experience with V
and validate it's promises. I decided to use Rosie because I like many of it's ideas, and thought
it would be a good contributions to V as well.

Obviously I had to start somewhere, and I decided to start with the RPL runtime. The original
RPL runtime is written in C, whereas the compiler and frontend is a mixture of C and Lua.
The V implementation started as copy of the C-code, gradually introducing more and more V constructs,
and also replacing 'unsafe' pointer arithmetics. V's C-to-V translator was not yet available,
hence I translated and reengineered the code manually.

Next I've added an RPL parser written in V, able to read and parse RPL source code into an
AST (intermediate representation). It successfully reads all '\*.rpl' files provided in Rosie's library,
including the rpl files implementing the RPL language specification itself.

And then a compiler that generates RPL-VM byte code instructions (v2). Now I had all core components
required available and fully implemented in V-lang.

Performance tests and optimisations, a proper CLI, and the ability to easily plug-in additional parsers,
optimizers and compilers, were now top of my prio list.

First I've added a benchmark module for the runtime (matching input against a pattern), which let to a greatly
improved runtime performance. It was a good learning excercise for me on how certain V features affect
the performance, but also which ones are left to the C-Compiler and how the CPU architecture affects the results.
But I'm certainly not an X86 or SIMD assembler experts, neither a CPU profiling expert.

Slowly the project is moving into a more stable mode, evident by the enhancements that followed:
- A CLI (with colored output)
- A tracer (debugger) to more easily analyse what is happening when matching input against a pattern
- An Engine, that allows to more easily plug-in different versions of parsers, optimizer and compilers.
- Tons of entries in todo.md and TODO comments in the source code

As mentioned, this project started as PoC to practically test and gain some experience with V-lang.
Despite some rough edges here and there, so far I'm mostly pleased. See
[here](https://github.com/jdonnerstag/vlang-lessons-learnt/wiki) for my very own FAQ and
"things to remember" list. I find the V-code much easier to read and maintain
then comparable C-code. Compiler speed is definitely a plus as well, allowing for quick code-test cycles.
Occassionaly I wish a V-interpreter or -debugger would already be available, to help me find and fix
issues. For now, adding and removing debug messages is what I do (and why V built time is so important).

## CMD, PS, bash etc.. and the problem with quotes

This project can be embedded in other V projects, but it also comes with a cli. The cli has subcommands
such as 'grep' and 'match', expecting a pattern argument such as `"a" ~ "b"`. The pattern has
double quotes and spaces. Both are treated differently, depending on your shell (bash, CMD, PS, ...).
Because I stumbled upon it more then ones, I've collected links to blogs that helped me understand
[here](https://github.com/jdonnerstag/vlang-lessons-learnt/wiki/Command-lines-and-how-they-handle-single-and-double-quotes).

## Differences with Jamie's implementation

I've tried to limit differences as much as possible, but ocassionally and very conciously, I've decided to differ.

- built-in overrides: I've added a 'builtin' binding attribute in addition to 'alias' and 'local', so that e.g.
`builtin alias ~ = [:space:]+` will override the builtin implementation, and it'll be applied to all
patterns, including the imported packages, and their imports.
- I did not implement '#'. It's not used in any of Jamie's rpl files, which is a good indication, that it is not needed.
- The cli commands and outputs are slightly different
- Added support to print the byte-code (diassembler)
- The tracing output looks completely different, IMHO more concise and better readable.
- The supported rcfile variables are different, also 'add_xxx' allows to add a libpath or color.
- Because performance analysis revealed that captures occassionaly take significant time (%) off the end-to-end
processing time. The user function to execute a match allows to provide a list of bindings, which are
really needed, superseding what is defined in the rpl files.
- As alluded to above, my byte codes have evolved quite a bit, significantly contributing to the runtime performance.
- In RPL 1.x the &-operator is equivalent to {>p q}. Which IMHO is misleading, everybody beliefs its concatenation,
and I've not seen it being used anywhere in the lib files. Hence, we do not support it. I think Jamie plans
to remove it in RPL 2.x as well.
- Several improvements on Jamie's todo lists, have been implemented in my project, such as
- multiple byte-code entrypoints (per file)
- re-usable code-blocks (aka functions; but w/o parameters) reducing the file size many times
- Some commonly used (and complex) builtin pattern, e.g. "." and "~" have their own byte-code instruction
with performance optimized V-code implementations.
- Based on the experience gained throughout the project, I made a couple of suggestions to Jamie on how to evolve
the Rosie Pattern language in version. I've started with my own one in "./rpl/rosie/rpl_3_0_jdo.rpl". So far,
it is only thoughts and not yet implemented anywhere.