https://github.com/rmosolgo/lingo
parser generator
https://github.com/rmosolgo/lingo
crystal parser parser-generator
Last synced: 4 months ago
JSON representation
parser generator
- Host: GitHub
- URL: https://github.com/rmosolgo/lingo
- Owner: rmosolgo
- License: mit
- Created: 2015-11-15T21:54:18.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2021-12-29T21:34:13.000Z (about 4 years ago)
- Last Synced: 2025-05-01T09:56:26.468Z (9 months ago)
- Topics: crystal, parser, parser-generator
- Language: Crystal
- Homepage:
- Size: 56.6 KB
- Stars: 28
- Watchers: 3
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Lingo [](https://travis-ci.org/rmosolgo/lingo)
A parser generator for Crystal, inspired by [Parslet](https://github.com/kschiess/parslet).
Lingo provides text processing by:
- parsing the string into a tree of nodes
- providing a visitor to allow you to work from the tree
## Installation
Add this to your application's `shard.yml`:
```yaml
dependencies:
lingo:
github: rmosolgo/lingo
```
## Usage
Let's write a parser for highway names. The result will be a method for turning strings into useful objects:
```ruby
def parse_road(input_str)
ast = RoadParser.new.parse(input_str)
visitor = RoadVisitor.new
visitor.visit(ast)
visitor.road
end
road = parse_road("I-5N")
#
```
(See more examples in [`/examples`](https://github.com/rmosolgo/lingo/tree/master/examples).)
In the USA, we write highway names like this:
```
50 # Route 50
I-64 # Interstate 64
I-95N # Interstate 95, Northbound
29B # Business Route 29
```
### Parser
The general structure is `{interstate?}{number}{direction?}{business?}`. Let's express that with Lingo rules:
```ruby
class RoadParser < Lingo::Parser
# Match a string:
rule(:interstate) { str("I-") }
rule(:business) { str("B") }
# Match a regex:
rule(:digit) { match(/\d/) }
# Express repetition with `.repeat`
rule(:number) { digit.repeat }
rule(:north) { str("N") }
rule(:south) { str("S") }
rule(:east) { str("E") }
rule(:west) { str("W") }
# Compose rules by name
# Express alternation with |
rule(:direction) { north | south | east | west }
# Express sequence with >>
# Express optionality with `.maybe`
# Name matched strings with `.named`
rule(:road_name) {
interstate.named(:interstate).maybe >>
number.named(:number) >>
direction.named(:direction).maybe >>
business.named(:business).maybe
}
# You MUST name a starting rule:
root(:road_name)
end
```
#### Applying the Parser
An instance of a `Lingo::Parser` subclass has a `.parse` method which returns a tree of `Lingo::Node`s.
```ruby
RoadParser.new.parse("250B") # =>
```
It uses the rule named by `root`.
#### Making Rules
These methods help you create rules:
- `str("string")` matches string exactly
- `match(/[abc]/)` matches the regex exactly
- `a | b` matches `a` _or_ `b`
- `a >> b` matches `a` _followed by_ `b`
- `a.maybe` matches `a` or nothing
- `a.repeat` matches _one-or-more_ `a`s
- `a.repeat(0)` matches _zero-or-more_ `a`s
- `a.absent` matches _not-`a`_
- `a.named(:a)` names the result `:a` for handling by a visitor
### Visitor
After parsing, you get a tree of `Lingo::Node`s. To turn that into an application object, write a visitor.
The visitor may define `enter` and `exit` hooks for nodes named with `.named` in the Parser. It may set up some state during `#initialize`, then access itself from the `visitor` variable during hooks.
```ruby
class RoadVisitor < Lingo::Visitor
# Set up an accumulator
getter :road
def initialize
@road = Road.new
end
# When you find a named node, you can do something with it.
# You can access the current visitor as `visitor`
enter(:interstate) {
# since we found this node, this is a business route
visitor.road.interstate = true
}
# You can access the named Lingo::Node as `node`.
# Get the matched string with `.full_value`
enter(:number) {
visitor.road.number = node.full_value.to_i
}
enter(:direction) {
visitor.road.direction = node.full_value
}
enter(:business) {
visitor.road.business = true
}
end
```
#### Visitor Hooks
During the depth-first visitation of the resulting tree of `Lingo::Node`s, you can handle visits to nodes named with `.named`:
- `enter(:match)` is called when entering a node named `:match`
- `exit(:match)` is called when exiting a node named `:match`
Within the hooks, you can access two magic variables:
- `visitor` is the Visitor itself
- `node` is the matched `Lingo::Node` which exposes:
- `#full_value`: the full matched string
- `#line`, `#column`: position information for this match
## About this Project
### Goals
- Low barrier to entry: easy-to-learn API, short zero-to-working time
- Easy-to-read code, therefore easy-to-modify
- Useful errors (not accomplished)
### Non-goals
- Blazing-fast performance
- Theoretical correctness
### TODO
- [ ] Add some kind of debug output
### How slow is it?
Let's compare the built-in JSON parser to a Lingo JSON parser:
```
./lingo/benchmark $ crystal run --release slow_json.cr
Stdlib JSON 126.45k (± 1.55%) fastest
Lingo::JSON 660.18 (± 1.28%) 191.54× slower
```
Ouch, that's __a lot slower__.
But, it's on par with Ruby and `parslet`, the inspiration for this project:
```
$ ruby parslet_json_benchmark.rb
Calculating -------------------------------------
Parslet JSON 4.000 i/100ms
Built-in JSON 3.657k i/100ms
-------------------------------------------------
Parslet JSON 45.788 (± 4.4%) i/s - 232.000
Built-in JSON 38.285k (± 5.3%) i/s - 193.821k
Comparison:
Built-in JSON: 38285.2 i/s
Parslet JSON : 45.8 i/s - 836.13x slower
```
Both Parslet and Lingo are slower than handwritten parsers. But, they're easier to write!
## Development
- Run the __tests__ with `crystal spec`
- Install Ruby & `guard`, then start a __watcher__ with `guard`