Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/begin/parsers-compilers
Lexers, tokenizers, parsers, compilers, renderers, stringifiers... What's the difference, and how do they work?
https://github.com/begin/parsers-compilers
ast compiler guide lexer node parse parsers-compilers syntax-tree token token-stream tokenize
Last synced: about 1 month ago
JSON representation
Lexers, tokenizers, parsers, compilers, renderers, stringifiers... What's the difference, and how do they work?
- Host: GitHub
- URL: https://github.com/begin/parsers-compilers
- Owner: begin
- License: cc-by-4.0
- Created: 2016-09-12T20:49:33.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2017-04-26T20:11:12.000Z (about 7 years ago)
- Last Synced: 2024-02-20T16:35:43.969Z (5 months ago)
- Topics: ast, compiler, guide, lexer, node, parse, parsers-compilers, syntax-tree, token, token-stream, tokenize
- Size: 11.7 KB
- Stars: 22
- Watchers: 5
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists
- awesome-stars - parsers-compilers
README
# parsers-compilers
> Lexers, tokenizers, parsers, compilers, renderers, stringifiers... What's the difference, and how do they work?
- [lexer](#lexer)
- [parser](#parser)
- [compiler](#compiler)
- [renderer](#renderer)
- [related](#related)## Lexer
> Sifts, or "tokenizes" the characters in a string to create an array of objects, referred to as a "token stream"
**Returns**: Token stream.
**Concepts**
- token stream
- token
- lexical scoping
- lexical context**Description**
A token stream is an array of "tokens", where each "token" is an object that contain details about a specific substring that was "captured", such as column and row, or line number and character position.
**Example token**
```js
{
type: 'text',
value: 'abc',
position: {
start: {
column: 1
line: 1
},
end: {
column: 3,
line: 1
}
}
}
```A token should also (and only) attempt to describe _basic lexical context_, such as character "type", which might be something like `text`, `number`, `escaped`, `delimiter`, or something similar.
Lexers should not attempt to describe dynamic scope, like where a bracketed section begins or ends, this kind of thing is left to the parser and is better represented by an Abstract Syntax Tree (AST).
**Example token stream**
A JavaScript token stream for the string `abc{foo}xyz` might look something like this:
```js
[
{
type: 'text',
value: 'abc',
position: {start: {column: 1 line: 1}, end: {column: 3, line: 1}}
},
{
type: 'left-brace',
value: '{',
position: {start: {column: 4 line: 1}, end: {column: 4, line: 1}}
},
{
type: 'text',
value: 'foo',
position: {start: {column: 5 line: 1}, end: {column: 7, line: 1}}
},
{
type: 'right-brace',
value: '}',
position: {start: {column: 8 line: 1}, end: {column: 8, line: 1}}
},
{
type: 'text',
value: 'xyz',
position: {start: {column: 9 line: 1}, end: {column: 11, line: 1}}
}
]
```## Parser
> Parses a stream of tokens into an Abstract Syntax Tree (AST)
**Returns**: AST object
**Concepts**
- Abstract Syntax Tree (AST)
- nodes
- node
- dynamic scoping
- dynamic context**Description**
Whereas a token stream is a "flat" array, the _Abstract Ayntax Tree_ generated by a parser gives the tokens a dynamic, or global, context.
Thus, an AST is represented as an object, versus an array.
**Example**
A JavaScript AST for the string `abc{foo}xyz` might look something like this:
```js
{
type: 'root',
nodes: [
{
type: 'text',
value: 'abc',
position: {start: {column: 1 line: 1}, end: {column: 3, line: 1}}
},
{
type: 'brace',
nodes: [
{
type: 'left-brace',
value: '{',
position: {start: {column: 4 line: 1}, end: {column: 4, line: 1}}
},
{
type: 'text',
value: 'foo',
position: {start: {column: 5 line: 1}, end: {column: 7, line: 1}}
},
{
type: 'right-brace',
value: '}',
position: {start: {column: 8 line: 1}, end: {column: 8, line: 1}}
}
]
},
{
type: 'text',
value: 'xyz',
position: {start: {column: 9 line: 1}, end: {column: 11, line: 1}}
}
]
}
```## Compiler
> Creates a function by converting an AST into a string of function statements and wrapping it with a boilerplate function body that defines the arguments the function can take. This generated function is then cached for re-use before being returned.
**Returns**: Function
**Concepts**
- function body
- function statements
- caching**Notes**
The goal of a compiler is to create a cached function that can be invoked one or more times, on-demand, with the same or different arguments. The arguments passed to a compiled function are referred to as "context".
## Renderer
> Invokes the function returned from a compiler with a given "context", producing a string where any placeholders or variables that may have been defined are replaced with actual values.
**Returns**: String
**Concepts**
- context
- variables## Related
- [the-super-tiny-compiler](https://github.com/thejameskyle/the-super-tiny-compiler)
- **stringify**: typically refers to converting an object to a string representation of the object. Example: `{foo: 'bar'}` would convert to the string `'{"foo": "bar"}'`.
- **assembler**: todo
- **interpreter**: todo
- **translater**: todo