Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/begin/parsers-compilers

Lexers, tokenizers, parsers, compilers, renderers, stringifiers... What's the difference, and how do they work?
https://github.com/begin/parsers-compilers

ast compiler guide lexer node parse parsers-compilers syntax-tree token token-stream tokenize

Last synced: about 1 month ago
JSON representation

Lexers, tokenizers, parsers, compilers, renderers, stringifiers... What's the difference, and how do they work?

Host: GitHub
URL: https://github.com/begin/parsers-compilers
Owner: begin
License: cc-by-4.0
Created: 2016-09-12T20:49:33.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2017-04-26T20:11:12.000Z (about 7 years ago)
Last Synced: 2024-02-20T16:35:43.969Z (5 months ago)
Topics: ast, compiler, guide, lexer, node, parse, parsers-compilers, syntax-tree, token, token-stream, tokenize
Size: 11.7 KB
Stars: 22
Watchers: 5
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Lists

awesome-stars - parsers-compilers

README

        # parsers-compilers

> Lexers, tokenizers, parsers, compilers, renderers, stringifiers... What's the difference, and how do they work? 

- [lexer](#lexer)

- [parser](#parser)

- [compiler](#compiler)

- [renderer](#renderer)

- [related](#related)

## Lexer

> Sifts, or "tokenizes" the characters in a string to create an array of objects, referred to as a "token stream"

**Returns**: Token stream.

**Concepts**

- token stream

- token

- lexical scoping

- lexical context

**Description**

A token stream is an array of "tokens", where each "token" is an object that contain details about a specific substring that was "captured", such as column and row, or line number and character position.

**Example token**

```js

{

  type: 'text',

  value: 'abc',

  position: {

    start: {

      column: 1 

      line: 1

    }, 

    end: {

      column: 3, 

      line: 1

    }

  }

}

```

A token should also (and only) attempt to describe _basic lexical context_, such as character "type", which might be something like `text`, `number`, `escaped`, `delimiter`, or something similar.

Lexers should not attempt to describe dynamic scope, like where a bracketed section begins or ends, this kind of thing is left to the parser and is better represented by an Abstract Syntax Tree (AST).

**Example token stream**

A JavaScript token stream for the string `abc{foo}xyz` might look something like this:

```js

[

  {

    type: 'text',

    value: 'abc',

    position: {start: {column: 1 line: 1}, end: {column: 3, line: 1}}

  },

  {

    type: 'left-brace',

    value: '{',

    position: {start: {column: 4 line: 1}, end: {column: 4, line: 1}}

  },

  {

    type: 'text',

    value: 'foo',

    position: {start: {column: 5 line: 1}, end: {column: 7, line: 1}}

  },

  {

    type: 'right-brace',

    value: '}',

    position: {start: {column: 8 line: 1}, end: {column: 8, line: 1}}

  },

  {

    type: 'text',

    value: 'xyz',

    position: {start: {column: 9 line: 1}, end: {column: 11, line: 1}}

  }

]

```

## Parser

> Parses a stream of tokens into an Abstract Syntax Tree (AST)

**Returns**: AST object

**Concepts**

- Abstract Syntax Tree (AST)

- nodes

- node

- dynamic scoping

- dynamic context

**Description**

Whereas a token stream is a "flat" array, the _Abstract Ayntax Tree_ generated by a parser gives the tokens a dynamic, or global, context. 

Thus, an AST is represented as an object, versus an array.

**Example**

A JavaScript AST for the string `abc{foo}xyz` might look something like this:

```js

{

  type: 'root',

  nodes: [

    {

      type: 'text',

      value: 'abc',

      position: {start: {column: 1 line: 1}, end: {column: 3, line: 1}}

    },

    {

      type: 'brace',

      nodes: [

        {

          type: 'left-brace',

          value: '{',

          position: {start: {column: 4 line: 1}, end: {column: 4, line: 1}}

        },

        {

          type: 'text',

          value: 'foo',

          position: {start: {column: 5 line: 1}, end: {column: 7, line: 1}}

        },

        {

          type: 'right-brace',

          value: '}',

          position: {start: {column: 8 line: 1}, end: {column: 8, line: 1}}

        }

      ]

    },

    {

      type: 'text',

      value: 'xyz',

      position: {start: {column: 9 line: 1}, end: {column: 11, line: 1}}

    }

  ]

}

```

## Compiler

> Creates a function by converting an AST into a string of function statements and wrapping it with a boilerplate function body that defines the arguments the function can take. This generated function is then cached for re-use before being returned.

**Returns**: Function

**Concepts**

- function body

- function statements

- caching

**Notes**

The goal of a compiler is to create a cached function that can be invoked one or more times, on-demand, with the same or different arguments. The arguments passed to a compiled function are referred to as "context".

## Renderer

> Invokes the function returned from a compiler with a given "context", producing a string where any placeholders or variables that may have been defined are replaced with actual values.

**Returns**: String

**Concepts**

- context

- variables

## Related

- [the-super-tiny-compiler](https://github.com/thejameskyle/the-super-tiny-compiler)

- **stringify**: typically refers to converting an object to a string representation of the object. Example: `{foo: 'bar'}` would convert to the string `'{"foo": "bar"}'`.

- **assembler**: todo

- **interpreter**: todo

- **translater**: todo