https://github.com/ya2ir/c_lexer

C Lexer
https://github.com/ya2ir/c_lexer

c lexer lexical-analysis lexical-analyzer tokenization tokenizer

Last synced: 10 months ago
JSON representation

C Lexer

Host: GitHub
URL: https://github.com/ya2ir/c_lexer
Owner: YA2IR
License: unlicense
Created: 2024-08-08T01:17:07.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-10-15T14:45:39.000Z (over 1 year ago)
Last Synced: 2025-08-23T11:03:59.980Z (10 months ago)
Topics: c, lexer, lexical-analysis, lexical-analyzer, tokenization, tokenizer
Language: C
Homepage:
Size: 8.79 KB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # About

This is a lexer/tokenizer for the C programming language. It takes a C file as an input and produces the following (literal, token_type) pairs as output:

```

// example1.c:

    int x = 1.f*0xAB;

// gcc lexer/lexer.c main.c && ./a.out example1.c:

    == ('int','INT')

    == ('x','IDENT')

    == ('=','ASSIGN')

    == ('1.f','NUM')

    == ('*','STAR')

    == ('0xAB','NUM')

    == (';','SEMICOLON')

    == --END-- ==

```

For a full example, you can run it on example.c -an actual file-:

```

gcc main.c lexer/lexer.c && ./a.out example.c

```

# Limitations

- The lexer supports most but not all of the language. For example, it doesn't support the scientific notation "123.45e-6" yet, and I might have missed things like supporting suffixes in hexadecimal literals (i.e. "0xA(L)" is not allowed)

- There is only one token type that represents numbers "NUM"

- It doesn't support directives that are handled by the preprocessor (e.g. #define and #include)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ya2ir/c_lexer

Awesome Lists containing this project

README