Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/blckngm/lexi-matic
https://github.com/blckngm/lexi-matic
Last synced: 18 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/blckngm/lexi-matic
- Owner: blckngm
- Created: 2024-01-25T01:24:24.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-01-29T12:10:03.000Z (10 months ago)
- Last Synced: 2024-10-06T12:45:56.096Z (about 1 month ago)
- Language: Rust
- Size: 14.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Lexi-Matic
A proc-macro for lexers similar to logos. Uses regex-automata DFA under the
hood.```rust
# use lexi_matic::Lexer;
#[derive(Debug, Lexer, PartialEq, Eq)]
#[lexer(skip = "//[^\n]*\n", skip = r"[ \t\r\n\f]+")]
enum Token<'a> {
#[token("import")]
Import,
#[token(";")]
Semi,
#[regex("[a-zA-Z_][a-zA-Z0-9_]*")]
Ident(&'a str),
}// An iterator of Result<(usize, Token, usize), lexi_matic::Error>.
let tokens = Token::lex("import foo_bar;import import1;// ...\nimport buz;");
for t in tokens {
let (start, t, end) = t.unwrap();
println!("{start}..{end} {t:?}");
}
```## Token Disambiguation
There are only two simple rules:
* Longer matches always win.
* If multiple patterns are matched for the longest match, the *first* pattern wins.So if you have keywords and identifiers, specify the keywords *first*:
```rust
# use lexi_matic::Lexer;
#[derive(Lexer)]
enum Token<'a> {
#[token("import")]
Import,
#[regex("[a-zA-Z_][a-zA-Z0-9_]*")]
Ident(&'a str),
}
```So `import` would be `Import` but `import1` would be `Ident`.
## Custom Lexing
Sometimes the lexing grammar isn't regular or even
[context-free](https://github.com/rust-lang/rust/blob/HEAD@%7B2019-05-26T21:45:17Z%7D/src/grammar/raw-string-literal-ambiguity.md). You can use a callback for these:```rust
# use lexi_matic::Lexer;#[derive(Debug, Lexer)]
enum Token<'a> {
#[token(";")]
Semi,
#[regex(r##"r#*""##)]
#[lexer(more = end_raw_str)]
RawStr(&'a str),
}// A `more` function should return how many more bytes to include in this token.
// We are trying to finish a raw string literal, so we search for the matching
// `"###`.
//
// If a `more` function returns `None`, it is considered a lexical error.
fn end_raw_str(matched: &str, remaining: &str) -> Option {
let start: String = matched[1..].chars().rev().collect();
remaining.find(&start).map(|l| l + start.len())
}
```