https://github.com/lucasb-eyer/flex-bison-indentation

An example of how to correctly parse python-like indentation-scoped files using flex (and bison).
https://github.com/lucasb-eyer/flex-bison-indentation

bison flex indentation parse python scanner

Last synced: about 1 year ago
JSON representation

An example of how to correctly parse python-like indentation-scoped files using flex (and bison).

Host: GitHub
URL: https://github.com/lucasb-eyer/flex-bison-indentation
Owner: lucasb-eyer
License: mit
Created: 2013-03-28T13:29:30.000Z (about 13 years ago)
Default Branch: master
Last Pushed: 2019-03-23T15:14:54.000Z (about 7 years ago)
Last Synced: 2024-04-15T02:44:25.285Z (about 2 years ago)
Topics: bison, flex, indentation, parse, python, scanner
Language: Lex
Size: 11.7 KB
Stars: 39
Watchers: 4
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

flex-bison-indentation
======================

An example of how to correctly parse python-like indentation-scoped files using flex (and bison).

Besides that, this project also serves as a template CMake-based project for a flex&bison parser
and includes rules to track the current line and column of the scanner.

Quick overview
==============

All the magic happens in the scanner, which emits `TOK_INDENT` and `TOK_OUTDENT` tokens whenever
the level of indentation increases or decreases. The parser in this project just echoes the tokens.

The scanner includes the `` mode which it starts in. That's where you
put your regular rules. Whenever a newline is encountered in that mode, the
parser enters the `` mode, in which it keeps counting the spaces and
tabs (and ignoring blank lines) until it sees anything else, in which case it
outputs either a `TOK_INDENT`, one or more `TOK_OUTDENT` as necessary or none
of these tokens and goes back to `` mode.

The scanner also does its best to keep track of the column where the current
match starts, which can be accessed (and changed) through `yycolumn`. The line
number is kept track of by flex internally.

All of this means that you can write the parser as usual, make use of the
`TOK_INDENT` and `TOK_OUTDENT` tokens in order to handle indentation and access
the current line of tokens through `@1.first_line` (and `@1.last_line` if the
token spans multiple lines, which I don't recommend.) and the column range of it
through `@1.first_column` and `@1.last_column`.

One caveat is that if one of your rules includes a newline character and is
matches text longer than one symbol, you will need to reset `yycolumn` by hand.

Another one is that, for technical reasons, the column-range of the
`TOK_INDENT` and `TOK_OUTDENT` tokens is the first character of the line or,
for outdents happening through reaching the end of the file, `0-0`.

Until I write a full tutorial, I recommend you look at the code, it is short and fully commented.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lucasb-eyer/flex-bison-indentation

Awesome Lists containing this project

README