Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rmosolgo/lingo
parser generator
https://github.com/rmosolgo/lingo
crystal parser parser-generator
Last synced: 14 days ago
JSON representation
parser generator
- Host: GitHub
- URL: https://github.com/rmosolgo/lingo
- Owner: rmosolgo
- License: mit
- Created: 2015-11-15T21:54:18.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2021-12-29T21:34:13.000Z (about 3 years ago)
- Last Synced: 2024-12-24T20:39:21.992Z (17 days ago)
- Topics: crystal, parser, parser-generator
- Language: Crystal
- Homepage:
- Size: 56.6 KB
- Stars: 28
- Watchers: 4
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Lingo [![Build Status](https://travis-ci.org/rmosolgo/lingo.svg)](https://travis-ci.org/rmosolgo/lingo)
A parser generator for Crystal, inspired by [Parslet](https://github.com/kschiess/parslet).
Lingo provides text processing by:
- parsing the string into a tree of nodes
- providing a visitor to allow you to work from the tree## Installation
Add this to your application's `shard.yml`:
```yaml
dependencies:
lingo:
github: rmosolgo/lingo
```## Usage
Let's write a parser for highway names. The result will be a method for turning strings into useful objects:
```ruby
def parse_road(input_str)
ast = RoadParser.new.parse(input_str)
visitor = RoadVisitor.new
visitor.visit(ast)
visitor.road
endroad = parse_road("I-5N")
#
```(See more examples in [`/examples`](https://github.com/rmosolgo/lingo/tree/master/examples).)
In the USA, we write highway names like this:
```
50 # Route 50
I-64 # Interstate 64
I-95N # Interstate 95, Northbound
29B # Business Route 29
```### Parser
The general structure is `{interstate?}{number}{direction?}{business?}`. Let's express that with Lingo rules:
```ruby
class RoadParser < Lingo::Parser
# Match a string:
rule(:interstate) { str("I-") }
rule(:business) { str("B") }# Match a regex:
rule(:digit) { match(/\d/) }
# Express repetition with `.repeat`
rule(:number) { digit.repeat }rule(:north) { str("N") }
rule(:south) { str("S") }
rule(:east) { str("E") }
rule(:west) { str("W") }
# Compose rules by name
# Express alternation with |
rule(:direction) { north | south | east | west }# Express sequence with >>
# Express optionality with `.maybe`
# Name matched strings with `.named`
rule(:road_name) {
interstate.named(:interstate).maybe >>
number.named(:number) >>
direction.named(:direction).maybe >>
business.named(:business).maybe
}
# You MUST name a starting rule:
root(:road_name)
end
```#### Applying the Parser
An instance of a `Lingo::Parser` subclass has a `.parse` method which returns a tree of `Lingo::Node`s.
```ruby
RoadParser.new.parse("250B") # =>
```It uses the rule named by `root`.
#### Making Rules
These methods help you create rules:
- `str("string")` matches string exactly
- `match(/[abc]/)` matches the regex exactly
- `a | b` matches `a` _or_ `b`
- `a >> b` matches `a` _followed by_ `b`
- `a.maybe` matches `a` or nothing
- `a.repeat` matches _one-or-more_ `a`s
- `a.repeat(0)` matches _zero-or-more_ `a`s
- `a.absent` matches _not-`a`_
- `a.named(:a)` names the result `:a` for handling by a visitor### Visitor
After parsing, you get a tree of `Lingo::Node`s. To turn that into an application object, write a visitor.
The visitor may define `enter` and `exit` hooks for nodes named with `.named` in the Parser. It may set up some state during `#initialize`, then access itself from the `visitor` variable during hooks.
```ruby
class RoadVisitor < Lingo::Visitor
# Set up an accumulator
getter :road
def initialize
@road = Road.new
end# When you find a named node, you can do something with it.
# You can access the current visitor as `visitor`
enter(:interstate) {
# since we found this node, this is a business route
visitor.road.interstate = true
}# You can access the named Lingo::Node as `node`.
# Get the matched string with `.full_value`
enter(:number) {
visitor.road.number = node.full_value.to_i
}enter(:direction) {
visitor.road.direction = node.full_value
}enter(:business) {
visitor.road.business = true
}
end
```#### Visitor Hooks
During the depth-first visitation of the resulting tree of `Lingo::Node`s, you can handle visits to nodes named with `.named`:
- `enter(:match)` is called when entering a node named `:match`
- `exit(:match)` is called when exiting a node named `:match`Within the hooks, you can access two magic variables:
- `visitor` is the Visitor itself
- `node` is the matched `Lingo::Node` which exposes:
- `#full_value`: the full matched string
- `#line`, `#column`: position information for this match## About this Project
### Goals
- Low barrier to entry: easy-to-learn API, short zero-to-working time
- Easy-to-read code, therefore easy-to-modify
- Useful errors (not accomplished)### Non-goals
- Blazing-fast performance
- Theoretical correctness### TODO
- [ ] Add some kind of debug output
### How slow is it?
Let's compare the built-in JSON parser to a Lingo JSON parser:
```
./lingo/benchmark $ crystal run --release slow_json.cr
Stdlib JSON 126.45k (± 1.55%) fastest
Lingo::JSON 660.18 (± 1.28%) 191.54× slower
```Ouch, that's __a lot slower__.
But, it's on par with Ruby and `parslet`, the inspiration for this project:
```
$ ruby parslet_json_benchmark.rb
Calculating -------------------------------------
Parslet JSON 4.000 i/100ms
Built-in JSON 3.657k i/100ms
-------------------------------------------------
Parslet JSON 45.788 (± 4.4%) i/s - 232.000
Built-in JSON 38.285k (± 5.3%) i/s - 193.821kComparison:
Built-in JSON: 38285.2 i/s
Parslet JSON : 45.8 i/s - 836.13x slower
```Both Parslet and Lingo are slower than handwritten parsers. But, they're easier to write!
## Development
- Run the __tests__ with `crystal spec`
- Install Ruby & `guard`, then start a __watcher__ with `guard`