https://github.com/pgilbertschmitt/ysetl
My toy language based on Gary Levin's ISETL language
https://github.com/pgilbertschmitt/ysetl
Last synced: 2 months ago
JSON representation
My toy language based on Gary Levin's ISETL language
- Host: GitHub
- URL: https://github.com/pgilbertschmitt/ysetl
- Owner: PGilbertSchmitt
- Created: 2024-08-03T22:17:19.000Z (10 months ago)
- Default Branch: master
- Last Pushed: 2024-09-15T17:09:36.000Z (8 months ago)
- Last Synced: 2025-01-17T14:36:10.608Z (4 months ago)
- Language: Rust
- Size: 359 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# YSETL Language
A small, set-focused programming language based off of ISETL.
## History
In the beginning, there was [SETL](https://en.wikipedia.org/wiki/SETL). Showing up in 1969, it provided 2 composite data types: sets and tuples, and many built-in operations for working with sets. 2 decades later, Gary Levin, an associate professor of compsci at Clarkson University, developed ISETL (Interactive SETL) primary for use in 2 textbooks:
- **Learning Discrete Mathematics with ISETL** (1988, *ISBN 0-387-96898-9*)
- **Learning Abstract Algebra with ISETL** (1994, *ISBN 0-387-94152-5*)3 decades after that, I went to Boston on my birthday and stopped at Brattle Book Shop, a very old used book store. My favorite section in used book stores is always the STEM section. Something about old math and programming books just hits different. On this particular day, I found the Abstract Algebra book listed above, in like-new condition with a floppy disk of the ISETL language still in an unopened envelope on the inside back cover. What a find. On my way home, I also found that someone actually put the ISETL source code on Github back in 2021, with a Make recipe to build on Ubuntu (and I would later realize that that _someone_ is Gary Levin himself). It definitely works, but I wanted something a little more modern: a smoother REPL experience, safer scoping rules for variables, better flow control, sleeker syntax, and some features that I would just like to have personally (like atom literals). The OG language has some interesting features that you don't normally see in modern languages (for better or worse), and while I want something that plays similarly to ISETL, I have omitted several features which I did not think were safe.
One of the major changes I've implemented is that all values are immutable. Operations that act on collections will generate new instances rather than modify them in-place. Is this a good idea? Probably not. Will I code it in such a way that it's highly performant? Not a chance. But is it worth it? Eh...
While my personal implementation isn't designed to be a hammer for every nail, I can see this being a simple alternative to ISETL for use in the above textbooks (which is the whole reason I started this adventure). If this is in any usable state by the end of the year, I may try to tackle Advent of Code at the end of [CURRENT YEAR] in YSETL.
## Name
There's nothing special about the name **YSETL**, and I'm not breaking any new ground here. I just wanted something with **-SETL** in the name, and "Y" is funny because truly, I have to ask myself: _"y r u doin this??"_. The answer, unsurprisingly, is `¯\_( ͡° ͜ʖ ͡°)_/¯`
---
## Features
### DataTypes:
- [x] Booleans
- [x] Integers
- [x] Floats
- [x] Strings
- [x] Atoms
- [x] Tuples (Lists)
- [x] Sets
- [x] Maps
- [x] Functions
- [ ] Function Maps (specialized Maps)### Operations
- [x] Arithmetic
- [x] Control flow
- [x] Global variables
- [x] Local variables
- [x] Boolean operations
- [x] Tuple operations
- [x] Set operations
- [x] Map operations
- [x] Iteration
- [ ] Destructuring
- [ ] Assignment shorthand (+=, *=, etc)### Other
- [ ] REPL
- [ ] IO
- [ ] Separate Compilation and Execute steps (executing pre-compiled bytecode bundles)
- [ ] Line-Column info in error messages (compile-time)
- [ ] Line-Column info in error messages (run-time)
- [ ] Tail recursion---
## Potential optimizations
Overall, this is pretty **heckin** slow compared to other dynamically typed interpreted languages like JS and Ruby, which I didn't expect with the design being so simple. However, after a light bit of profiling, I realized that there are several sections which could be slimmed down.
- Switching frames by grabbing copies of the Closure/Iterator bytecode could slower than if I stitched all instructions into a single lazy-static `Vec` array and jumped inside it, though more testing is necessary to confirm that. However, I feel pretty confident that that step would be a requirement if I wanted to create precompiled blobs that could be passed to the VM as a separate step.
## Design
I want to start off by saying that I'm not some casual fool. I'm a _professional_ fool. That's all to say that this language was not written by someone who necessarily knows what they're doing. I did not go to college (not a real one, anyways). I've never taken a class on compilers. I would not consider a webdev bootcamp to be "formal training", and it definitely didn't teach me about parsers. This repository is a patchwork of best guesses and on-the-fly problem solving.
1. I start out by parsing the text using a parser generated by Pest from a PEG grammar. This creates a recursive structure of tagged structs called `Pair`s. `Pair` inners are a `Pairs` list, which is just a plain list of `Pair`s. Fancy that. It's basically an AST where every node is generic.
2. The parser in the file _called_ `parser.rs` calls the Pest parser and recursively evaluates the `Pair` structure, turning the generic AST into my own custom AST, which contextualizes everything, as well as catching certain incorrect patterns that the Pest parser allowed so that better error messaging can be given.
3. The `Compiler` struct is where my AST is recursively evaluated, generating bytecode. I could have built the compile step as the same pass as the 2nd parser step, but a custom AST makes it easier if I pursue a linter, formatter, or language server (if I'm feeling _EXTRA_ spicy).
4. Lastly, the `VM` struct accepts the output of the `Compiler` and evaluates it byte-by-byte.The current parser uses a tool and pattern that I've only used in isolation. Besides the Pest documentation and examples (and there weren't many), I scrapped together something workable, though I could probably do better by handrolling a recursive descent pratt parser.
The design of the bytecode/compiler/VM started out based on the Monkey language from the book [**How to Write a Compiler in Go** by Thorsten Ball](https://compilerbook.com/). However, right away there were some differences between YSetL and Monkey (as well as between Go and Rust) that affected the design. On top of that, I've implemented many things not found in Monkey, so when it came to choosing what bytecode to generate and how to execute it, I just got creative. For example, there are almost definitely better patterns that exist out there for implementing iterators in bytecode, but it made sense for me to think about them like functions with a very specific structure for its instructions.