An open API service indexing awesome lists of open source software.

https://github.com/elucidation/scannergenerator

Full monty of Lexical Parsing with miniRE parsing, AST table generation and walking, etc.
https://github.com/elucidation/scannergenerator

Last synced: about 1 year ago
JSON representation

Full monty of Lexical Parsing with miniRE parsing, AST table generation and walking, etc.

Awesome Lists containing this project

README

          

Master branch
Project Phase 2
---
ScannerGenerator now incorporates a MiniRe parser and AST generator, and theoretically an AST walker.

Usage
---
Run Main.java or ScannerGenerator.jar package like so:

ScannerGenerator []

If no output file is provided it appends `_Output.txt` to the input file and uses that as filename.

File Structure
---

* Main.java - Main driver for system
* DFATable.java - The specialized HashMap that acts as our DFA Table
* ScannerGenerator.java - provides static function to generate a DFA Table from specification file
* TableWalker.java - Pumps out tokens using the DFA Table while being provided a stream of input chars

Helpers:

* State.java
* StateCharacter.java
* Token.java

Character Class Parsing
---
Input Spec File has:

$DIGIT [0-9]
$NON-ZERO [^0] IN $DIGIT
$CHAR [a-zA-Z]
$UPPER [^a-z] IN $CHAR
$LOWER [^A-Z] IN $CHAR

Results in a HashMap< String, HashSet > as shown below:

$UPPER(26) : [D, E, F, G, A, B, C, L, M, N, O, H, I, J, K, U, T, W, V, Q, P, S, R, Y, X, Z]
$DIGIT(10) : [3, 2, 1, 0, 7, 6, 5, 4, 9, 8]
$CHAR(52) : [D, E, F, G, A, B, C, L, M, N, O, H, I, J, K, U, T, W, V, Q, P, S, R, Y, X, Z, f, g, d, e, b, c, a, n, o, l, m, j, k, h, i, w, v, u, t, s, r, q, p, z, y, x]
$LOWER(26) : [f, g, d, e, b, c, a, n, o, l, m, j, k, h, i, w, v, u, t, s, r, q, p, z, y, x]
$NON-ZERO(9) : [3, 2, 1, 7, 6, 5, 4, 9, 8]

Since we're using a HashSet to store the data there is no order, however we get O(1) in/out ops which is all we'd ever use when checking via DFA Table.

Identifier Parsing
---

When doing ```Parsing Identifier: $FLOAT ($DIGIT)+ \. ($DIGIT)+```, the value ```($DIGIT)+\.($DIGIT)+``` is tokenized like so

L_PAREN
CHARCLASS
R_PAREN
ONE_OR_MORE
SPECIAL_CHAR
L_PAREN
CHARCLASS
R_PAREN
ONE_OR_MORE

The logic is as follows (spaces/tabs ignored, stands for nothing):

= '|' | |
=
= '*' | '+' |
= | '\' | '(' ')'

So a set of runs:

Trying to Recursively Parse '$LOWER($LOWER|$DIGIT)*'...
EXPR
TERM
FACTOR
BASE
MATCH: $LOWER
TERM
FACTOR
BASE
MATCH: (
EXPR
TERM
FACTOR
BASE
MATCH: $LOWER
MATCH: |
EXPR
TERM
FACTOR
BASE
MATCH: $DIGIT
MATCH: )
ZERO OR MORE
MATCH: *
Finished Recursive Parse.

Trying to Recursively Parse '($DIGIT)+'...
EXPR
TERM
FACTOR
BASE
MATCH: (
EXPR
TERM
FACTOR
BASE
MATCH: $DIGIT
MATCH: )
ONE OR MORE
MATCH: +
Finished Recursive Parse.

Trying to Recursively Parse '($DIGIT)+\.($DIGIT)+'...
EXPR
TERM
FACTOR
BASE
MATCH: (
EXPR
TERM
FACTOR
BASE
MATCH: $DIGIT
MATCH: )
ONE OR MORE
MATCH: +
TERM
FACTOR
BASE
MATCH: \.
Finished Recursive Parse.

Trying to Recursively Parse '='...
EXPR
TERM
FACTOR
BASE
MATCH: =
Finished Recursive Parse.

Trying to Recursively Parse '\+'...
EXPR
TERM
FACTOR
BASE
MATCH: \+
Finished Recursive Parse.

Trying to Recursively Parse '-'...
EXPR
TERM
FACTOR
BASE
MATCH: -
Finished Recursive Parse.

Trying to Recursively Parse '\*'...
EXPR
TERM
FACTOR
BASE
MATCH: \*
Finished Recursive Parse.

Trying to Recursively Parse 'PRINT'...
EXPR
TERM
FACTOR
BASE
MATCH: P
EXPR
TERM
FACTOR
BASE
MATCH: R
EXPR
TERM
FACTOR
BASE
MATCH: I
EXPR
TERM
FACTOR
BASE
MATCH: N
EXPR
TERM
FACTOR
BASE
MATCH: T
Finished Recursive Parse.