https://github.com/vonderklaas/tiny-lexer

A program written in pure C language, that can perform lexical tokenization of an arbitrary programming language, 'tinylang' in this particular case.
https://github.com/vonderklaas/tiny-lexer

c lexer lexer-parser lexical-analysis

Last synced: 3 months ago
JSON representation

A program written in pure C language, that can perform lexical tokenization of an arbitrary programming language, 'tinylang' in this particular case.

Host: GitHub
URL: https://github.com/vonderklaas/tiny-lexer
Owner: vonderklaas
Created: 2023-11-08T14:59:01.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-05-07T07:57:40.000Z (about 2 years ago)
Last Synced: 2025-04-24T02:43:10.705Z (about 1 year ago)
Topics: c, lexer, lexer-parser, lexical-analysis
Language: C
Homepage:
Size: 50.8 KB
Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

### Description

Lexical tokenization is conversion of a text into (semantically or syntactically) meaningful lexical tokens belonging to categories defined by a lexer program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types.

### Examples

This is source code
```c
a : integer = 0
a := 0

b : integer
b := 0

defun foo (a:integer, b:integer):integer {

}
```

These are broken down tokens
```c
Token 0: a
Token 1: :
Token 2: integer
Token 3: =
Token 4: 0
Token 5: a
Token 6: :
Token 7: =
Token 8: 0
Token 9: b
Token 10: :
Token 11: integer
Token 12: b
Token 13: :
Token 14: =
Token 15: 0
Token 16: defun
Token 17: foo
Token 18: (
Token 19: a
Token 20: :
Token 21: integer
Token 22: ,
Token 23: b
Token 24: :
Token 25: integer
Token 26: )
Token 27: :
Token 28: integer
Token 29: {
Token 30: }
```

### Compilation Stages

**Preprocessing** — ✅

Input: Source Code

Output: Modified Source Code

**Tokenization** — ✅

Input: Preprocessed Source Code

Output: Stream of Tokens

(WIP)
**Syntax Analysis**

Input: Tokens from Lexical Analysis (Tokenization)

Output: AST

(WIP)
**Semantic Analysis**

Input: AST

Output: Annotated AST with Semantic Information

(WIP)
**Intermediate Code Generation**

Input: Annotated AST

Output: IR

(WIP)
**Optimization**

Input: IR

Output: Optimized IR

(WIP)
**Code Generation**

Input: Optimized IR

Output: Machine Code or Assembly

**Linking**

Input: Compiled Machine Code

Output: Single Executable for Specific Architecture

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vonderklaas/tiny-lexer

Awesome Lists containing this project

README