An open API service indexing awesome lists of open source software.

https://github.com/lucasb-eyer/flex-bison-indentation

An example of how to correctly parse python-like indentation-scoped files using flex (and bison).
https://github.com/lucasb-eyer/flex-bison-indentation

bison flex indentation parse python scanner

Last synced: about 1 year ago
JSON representation

An example of how to correctly parse python-like indentation-scoped files using flex (and bison).

Awesome Lists containing this project

README

          

flex-bison-indentation
======================

An example of how to correctly parse python-like indentation-scoped files using flex (and bison).

Besides that, this project also serves as a template CMake-based project for a flex&bison parser
and includes rules to track the current line and column of the scanner.

Quick overview
==============

All the magic happens in the scanner, which emits `TOK_INDENT` and `TOK_OUTDENT` tokens whenever
the level of indentation increases or decreases. The parser in this project just echoes the tokens.

The scanner includes the `` mode which it starts in. That's where you
put your regular rules. Whenever a newline is encountered in that mode, the
parser enters the `` mode, in which it keeps counting the spaces and
tabs (and ignoring blank lines) until it sees anything else, in which case it
outputs either a `TOK_INDENT`, one or more `TOK_OUTDENT` as necessary or none
of these tokens and goes back to `` mode.

The scanner also does its best to keep track of the column where the current
match starts, which can be accessed (and changed) through `yycolumn`. The line
number is kept track of by flex internally.

All of this means that you can write the parser as usual, make use of the
`TOK_INDENT` and `TOK_OUTDENT` tokens in order to handle indentation and access
the current line of tokens through `@1.first_line` (and `@1.last_line` if the
token spans multiple lines, which I don't recommend.) and the column range of it
through `@1.first_column` and `@1.last_column`.

One caveat is that if one of your rules includes a newline character and is
matches text longer than one symbol, you will need to reset `yycolumn` by hand.

Another one is that, for technical reasons, the column-range of the
`TOK_INDENT` and `TOK_OUTDENT` tokens is the first character of the line or,
for outdents happening through reaching the end of the file, `0-0`.

Until I write a full tutorial, I recommend you look at the code, it is short and fully commented.