https://github.com/repcomm/recursive-descent-parser

Learning how to write a compiler
https://github.com/repcomm/recursive-descent-parser

Last synced: 10 months ago
JSON representation

Learning how to write a compiler

Host: GitHub
URL: https://github.com/repcomm/recursive-descent-parser
Owner: RepComm
Created: 2020-09-30T01:59:07.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-11-19T23:58:00.000Z (about 5 years ago)
Last Synced: 2025-01-28T08:52:12.025Z (12 months ago)
Language: TypeScript
Size: 97.7 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: ReadMe.md

Awesome Lists containing this project

README

          # recursive-descent-parser

Learning how to write a compiler

## Building

`npm run build` or `./build.sh`

## Methods described

There are several steps to run source code


A series of passes over the data make it easier to handle:


### Tokenize

The first step scans through the source code as a string


and returns a series of tokens


identified by their:

1. `type` - defines syntactic usage, such as identifier, keyword, operator, brackets, etc

2. `data` - typically the string represented by the type, but can be transformed by preprocessor


For instance, a preprocessor could take several tokens:


`{type: "parenthesis", data:"("}`,


`{type: "parenthesis", data:")"}`,


`{type: "operator", data:"="}`,


`{type: "operator", data:">"},`





And turn them into


`{type: "arrow-function", data:"()=>"}`,


3. line and char numbers (useful for debugging source)

### Preprocess

This part is still in the works, but it will essentially


be a function that passes over tokens and returns a


modified set.





What modifications actual entells is up to the preprocessor


but some examples are:

- source directives

- `.babelrc`

- special language features


not supported by a parser that


can be broken down into lower level codes.

### Parser

Creates a tree structure from a token array


called an Abstract Syntax Tree or AST


This is where the recursive descent part comes into play, and the part I came here to learn about.

### Interpreter / Codegen

I plan on implementing both an interpreter and code generator.


They will take an abstract syntax tree and


- run (interpreter)

- or compile (codegen) 


it into some lower level code


(typically OP codes, or machine code)

## Implementation

In my process I've decided to take a language-agnostic


approach, even though my end goal is probably
 something like `typescript/javascript`


For instance, the tokenizer process actually relies


on a `Scanner`, which is where language syntax will actually be handled,


and the `tokenize` function will already be implemented for you.

To handle your own language, you'll need to implement


a scanner subclass.

### Scanner

This is a class meant to be extended


It provides functionality to implement scanning text


an a more standard way, which should make debugging easier


- addPass - for adding more syntax handling

  ```ts

  addPass(name: string, pass: ScannerPass): this

  ```

  Where `name` is the token.type when pass is successful


  and pass is a [scanner pass](#ScannerPass)

### ScannerPass

Each scanner pass is meant to handle a single type of


language syntax.

```ts

(data: string, offset: number): ScannerData

```

Where `data` is the source code data


`offset` the offset in the source to read from


and `return` expected to be a [ScannerData](#ScannerData)

### ScannerData

```ts

{

  success: boolean //needs to be false when not finding data at offset that satisfies ScannerData.type

  readChars: number //chars that fit this type before we read something we didn't like

  readLines: number //obsolete, this will be handled by internal code soon

  error?: string //optional - meant for when positive identification of error is determined, not necessarily every time success == false

}

```

Note that scanner data does not actually return the text that was read, only the char count.


This is to standardize the reading process, which should cause a lot less errors


between implementations of languages.


Basically: don't allow reading of chars that don't fit your specifications, and don't count ones that don't.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/repcomm/recursive-descent-parser

Awesome Lists containing this project

README