https://github.com/ya2ir/c_lexer
C Lexer
https://github.com/ya2ir/c_lexer
c lexer lexical-analysis lexical-analyzer tokenization tokenizer
Last synced: 9 months ago
JSON representation
C Lexer
- Host: GitHub
- URL: https://github.com/ya2ir/c_lexer
- Owner: YA2IR
- License: unlicense
- Created: 2024-08-08T01:17:07.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-15T14:45:39.000Z (over 1 year ago)
- Last Synced: 2025-08-23T11:03:59.980Z (9 months ago)
- Topics: c, lexer, lexical-analysis, lexical-analyzer, tokenization, tokenizer
- Language: C
- Homepage:
- Size: 8.79 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# About
This is a lexer/tokenizer for the C programming language. It takes a C file as an input and produces the following (literal, token_type) pairs as output:
```
// example1.c:
int x = 1.f*0xAB;
// gcc lexer/lexer.c main.c && ./a.out example1.c:
== ('int','INT')
== ('x','IDENT')
== ('=','ASSIGN')
== ('1.f','NUM')
== ('*','STAR')
== ('0xAB','NUM')
== (';','SEMICOLON')
== --END-- ==
```
For a full example, you can run it on example.c -an actual file-:
```
gcc main.c lexer/lexer.c && ./a.out example.c
```
# Limitations
- The lexer supports most but not all of the language. For example, it doesn't support the scientific notation "123.45e-6" yet, and I might have missed things like supporting suffixes in hexadecimal literals (i.e. "0xA(L)" is not allowed)
- There is only one token type that represents numbers "NUM"
- It doesn't support directives that are handled by the preprocessor (e.g. #define and #include)