https://github.com/malk97sc/lexical

Lexical Analyzer
https://github.com/malk97sc/lexical

c compiler lexical-analyzer

Last synced: 6 months ago
JSON representation

Lexical Analyzer

Host: GitHub
URL: https://github.com/malk97sc/lexical
Owner: Malk97sc
Created: 2024-11-26T00:01:43.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-12-14T22:55:28.000Z (10 months ago)
Last Synced: 2025-02-02T03:18:43.180Z (8 months ago)
Topics: c, compiler, lexical-analyzer
Language: C
Homepage:
Size: 13.7 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          
# Lexical Analyzer in C

**Lexical analyzer simplified** written in C that processes an input file and identifies various types of tokens: keywords, identifiers, numeric constants, delimiters, operators, and string literals. It also handles basic errors related to invalid or incomplete tokens.

## Features

This file contains the implementation of the lexical analyzer. The program uses specific functions to recognize and process different token types in an input file. Key features include:

- **Input File Reading:** The `readFile` function reads the content of a file and stores it in memory for analysis.

- **Token Detection:** The `analyzer` function processes the content and categorizes tokens into the following types:

  - **Keywords:** `if`, `else`, `while`, `return`, etc.

  - **Identifiers:** Valid variables following specific rules.

  - **Numeric Constants:** Includes integers, floats, and scientific notation.

  - **Delimiters:** `,`, `;`, `{`, `}`, `(`, `)`, etc.

  - **Operators:** `+`, `-`, `*`, `/`, `&&`, `||`, etc.

  - **String Literals:** Strings enclosed in double quotes (`"example"`).

- **Validation Using Deterministic Finite Automata (DFA):**

  - DFA for numbers (`statenumb`).

  - DFA for identifiers and variables (`stateVar`).

- **Error Detection:** The program identifies and displays error messages for invalid or improperly formatted tokens.

## Getting Started

Follow these simple steps to install, compile, and run the program.

### 1. Prerequisites

- A Linux-based operating system (tested on Linux Mint).

- GCC (GNU Compiler Collection) installed on your system.

### 2. Compiling the Code

Use the following command to compile the code:

```bash

gcc analyzer.c  -o  analyzer

```

### 3. Running the Program

Execute the program with an input file:

```bash

./analyzer prueba.c

```

## Examples

Here are some examples of input and the corresponding output from the lexical analyzer:

### Example 1: Simple Input

#### Input File (`example1.c`):

```c

int main() {

    int a = 5;

    float b = 10.25;

    if (a < b) {

        printf("a is less than b\n");

    }

    return 0;

}

```

#### Output is:

```bash

The file is: 

int main() {

    int a = 5;

    float b = 10.25;

    if (a < b) {

        printf("a is less than b\n");

    }

    return 0;

}

The tokens in the file: 

TOKEN (INT): int

TOKEN (MAIN): main

TOKEN (LPAR): (

TOKEN (RPAR): )

TOKEN (LBRACE): {

TOKEN (INT): int

TOKEN (ID): a

TOKEN (ASSIGN): =

TOKEN (NUMBER): 5

TOKEN (SEMI): ;

TOKEN (FLOAT): float

TOKEN (ID): b

TOKEN (ASSIGN): =

TOKEN (NUMBER): 10.25

TOKEN (SEMI): ;

TOKEN (IF): if

TOKEN (LPAR): (

TOKEN (ID): a

TOKEN (LT): <

TOKEN (ID): b

TOKEN (RPAR): )

TOKEN (LBRACE): {

TOKEN (PRINTF): printf

TOKEN (LPAR): (

TOKEN (STRING): "a is less than b\n" 

TOKEN (RPAR): )

TOKEN (SEMI): ;

TOKEN (RBRACE): }

TOKEN (RETURN): return

TOKEN (NUMBER): 0

TOKEN (SEMI): ;

TOKEN (RBRACE): }

```

## 🛠️ How It Works  

The lexical analyzer operates in the following steps:  

1. **File Reading**  

   The program reads the input file line by line using the `readFile` function, storing the content in memory. This enables sequential token processing.  

2. **Tokenization**  

   The `analyzer` function scans the file content character by character to identify token boundaries. It uses specific rules to classify tokens, such as checking for keywords, operators, delimiters, and numeric constants.  

3. **Token Validation**  

   Each identified token is validated using helper functions:

   - **Keywords:** Compared against a predefined list (`if`, `else`, etc.).  

   - **Identifiers:** Verified to start with a letter or underscore, followed by letters, digits, or underscores.

   - **Operators:** Classified based on common C operators (`+`, `-`, `*`, `=`, etc.).

   - **Delimiters:** Identified based on common syntax delimiters (`;`, `,`, `(`, `)`, etc.).

   - **Constants:** Identified based on numeric formats for integers or floats.

4. **Output**  

   Once tokens are recognized, they are output with their corresponding types and values. Invalid tokens are flagged with error messages.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/malk97sc/lexical

Awesome Lists containing this project

README