Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/14ngiestas/fortran-tokenizer

A basic fortran tokenizer
https://github.com/14ngiestas/fortran-tokenizer

fortran fortran-package-manager parsers strings tokenizer

Last synced: about 2 months ago
JSON representation

A basic fortran tokenizer

Host: GitHub
URL: https://github.com/14ngiestas/fortran-tokenizer
Owner: 14NGiestas
License: mit
Created: 2022-08-09T05:47:55.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-11-07T22:52:05.000Z (3 months ago)
Last Synced: 2024-11-07T23:33:13.863Z (3 months ago)
Topics: fortran, fortran-package-manager, parsers, strings, tokenizer
Language: Fortran
Homepage:
Size: 6.84 KB
Stars: 2
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Fortran Tokenizer

This package provides a basic tokenizer and it allow users to easily customize it's behavior.

## Use in your project with fpm

Include this in your `fpm.toml` under the dependencies section:

```toml

[ dependencies ]

fortran-tokenizer = { git = "https://github.com/14NGiestas/fortran-tokenizer.git" }

```

## Basic Usage

### By default, a valid token do not contain spaces

```f90

!file: test/default.f90

program main

    use fortran_tokenizer

    implicit none

    type(tokenizer_t) :: tokenizer

    type(token_list) :: tokens

    ! Perform the tokenization

    tokens = tokenizer % tokenize("hello world")

    print*, tokens ! ['hello', 'world']

end program

```

### Tokenize by custom function

You can override the default behavior by defining a function that receives a `character(*), intent(in) :: token`

and returns `.true.` if the string received by the function is a valid token, `.false` otherwise.

```f90

!file: test/custom_function.f90

program main

    use fortran_tokenizer

    implicit none

    character, parameter :: TABS = char(9)

    character, parameter :: IDENT = repeat(TABS,4)

    character(*), parameter :: string = TABS//"hello "//IDENT//"world"

    type(tokenizer_t) :: tokenizer

    type(token_list) :: tokens

    ! Perform the tokenization

    tokens = tokenizer % tokenize(string, validate=by_whitespace)

    ! Show the resultant tokens

    print "('string: ""',A,'""')", string

    print*, tokens ! ['hello', 'world']

contains

    logical function by_whitespace(token) result(is_valid)

        !! The token will be valid if it do not contain " " or TABS

        character(*), intent(in) :: token

        is_valid = .not. scan(token, " "//TABS) > 0

    end function

end program

```

you can also pass a function with the same interface that once a token is parsed it can be ignored completely

```f90

!file: test/custom_ignore.f90

program main

    use fortran_tokenizer

    implicit none

    character(*), parameter :: string = "hello | | duck | world duck"

    type(tokenizer_t) :: tokenizer

    type(token_list) :: tokens

    ! Perform the tokenization

    tokens = tokenizer % tokenize(string, validate=by_pipe, ignore=if_is_duck)

    ! Show the resultant tokens

    print "('string: ""',A,'""')", string

    print*, tokens ! ['hello', 'world']

contains

    logical function by_pipe(token) result(is_valid)

        !! Return .false. if token is a pipe ("|")

        character(*), intent(in) :: token

        is_valid = .not. scan(token, "| ") > 0

    end function

    logical function if_is_duck(token) result(is_valid)

        !! Return .true. if token is a duck

        character(*), intent(in) :: token

        is_valid = token == "duck"

    end function

end program

```

### Bind a global rule

To reuse the same tokenizer rules you can bind global functions

```f90

!file: test/bind_global_function.f90

program main

    use fortran_tokenizer

    implicit none

    type(tokenizer_t) :: tokenizer

    type(token_list) :: tokens

    tokenizer % validator => global_behavior

    ! Perform the tokenization

    tokens = tokenizer % tokenize("hello| world")

    print*, tokens ! ['hello', 'world']

    ! Perform another tokenization

    tokens = tokenizer % tokenize("hello |world")

    print*, tokens ! ['hello', 'world']

    ! Perform the tokenization, using a local behaviour

    tokens = tokenizer % tokenize("hello| |world|token with spaces|", validate=local_behavior)

    ! Show the resultant tokens

    print*, tokens ! ['hello', ' ', 'world', 'token with spaces']

    ! Bind global ignore behavior

    tokenizer % ignore => ignore_tokens

    ! Perform the tokenization, using a local behaviour, but any token containg a space will ignored

    tokens = tokenizer % tokenize("hello| |world|tokens with spaces|", validate=local_behavior)

    ! Show the resultant tokens

    print*, tokens ! ['hello', 'world']

contains

    logical function global_behavior(token) result(is_valid)

        !! The token will be valid if it do not contain " " or "|"

        character(*), intent(in) :: token

        is_valid = .not. scan(token, "| ") > 0

    end function

    logical function local_behavior(token) result(is_valid)

        !! The token will be valid if it do not contain "|"

        character(*), intent(in) :: token

        is_valid = .not. scan(token, "|") > 0

    end function

    logical function ignore_tokens(token) result(is_valid)

        !! The token will be ignored if contains any spaces

        character(*), intent(in) :: token

        is_valid = scan(token, " ") > 0

    end function

end program

```

## Contributing

### Getting the source

First get the code, by cloning the repo:

```sh

git clone https://github.com/14NGiestas/fortran-tokenizer.git

cd fortran-tokenizer

```

## Building with FPM

This project was designed to be built using the [Fortran Package Manager](https://github.com/fortran-lang/fpm).

Follow the directions on that page to install FPM if you haven't already.

To build and run the program (provided as a example), type:

```sh

fpm run

```

to run the tests, type

```sh

fpm test

```