https://github.com/14ngiestas/fortran-tokenizer
A basic fortran tokenizer
https://github.com/14ngiestas/fortran-tokenizer
fortran fortran-package-manager parsers strings tokenizer
Last synced: 6 months ago
JSON representation
A basic fortran tokenizer
- Host: GitHub
- URL: https://github.com/14ngiestas/fortran-tokenizer
- Owner: 14NGiestas
- License: mit
- Created: 2022-08-09T05:47:55.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2025-02-11T10:50:03.000Z (10 months ago)
- Last Synced: 2025-03-27T16:38:33.029Z (9 months ago)
- Topics: fortran, fortran-package-manager, parsers, strings, tokenizer
- Language: Fortran
- Homepage:
- Size: 14.6 KB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Fortran Tokenizer
This package provides a basic tokenizer and it allow users to easily customize it's behavior.
## Use in your project with fpm
Include this in your `fpm.toml` under the dependencies section:
```toml
[ dependencies ]
fortran-tokenizer = { git = "https://github.com/14NGiestas/fortran-tokenizer.git" }
```
## Basic Usage
### By default, a valid token do not contain spaces
```f90
!file: test/default.f90
program main
use fortran_tokenizer
implicit none
type(tokenizer_t) :: tokenizer
type(token_list) :: tokens
! Perform the tokenization
tokens = tokenizer % tokenize("hello world")
print*, tokens ! ['hello', 'world']
end program
```
### Tokenize by custom function
You can override the default behavior by defining a function that receives a `character(*), intent(in) :: token`
and returns `.true.` if the string received by the function is a valid token, `.false` otherwise.
```f90
!file: test/custom_function.f90
program main
use fortran_tokenizer
implicit none
character, parameter :: TABS = char(9)
character, parameter :: IDENT = repeat(TABS,4)
character(*), parameter :: string = TABS//"hello "//IDENT//"world"
type(tokenizer_t) :: tokenizer
type(token_list) :: tokens
! Perform the tokenization
tokens = tokenizer % tokenize(string, validate=by_whitespace)
! Show the resultant tokens
print "('string: ""',A,'""')", string
print*, tokens ! ['hello', 'world']
contains
logical function by_whitespace(token) result(is_valid)
!! The token will be valid if it do not contain " " or TABS
character(*), intent(in) :: token
is_valid = .not. scan(token, " "//TABS) > 0
end function
end program
```
you can also pass a function with the same interface that once a token is parsed it can be ignored completely
```f90
!file: test/custom_ignore.f90
program main
use fortran_tokenizer
implicit none
character(*), parameter :: string = "hello | | duck | world duck"
type(tokenizer_t) :: tokenizer
type(token_list) :: tokens
! Perform the tokenization
tokens = tokenizer % tokenize(string, validate=by_pipe, ignore=if_is_duck)
! Show the resultant tokens
print "('string: ""',A,'""')", string
print*, tokens ! ['hello', 'world']
contains
logical function by_pipe(token) result(is_valid)
!! Return .false. if token is a pipe ("|")
character(*), intent(in) :: token
is_valid = .not. scan(token, "| ") > 0
end function
logical function if_is_duck(token) result(is_valid)
!! Return .true. if token is a duck
character(*), intent(in) :: token
is_valid = token == "duck"
end function
end program
```
### Bind a global rule
To reuse the same tokenizer rules you can bind global functions
```f90
!file: test/bind_global_function.f90
program main
use fortran_tokenizer
implicit none
type(tokenizer_t) :: tokenizer
type(token_list) :: tokens
tokenizer % validator => global_behavior
! Perform the tokenization
tokens = tokenizer % tokenize("hello| world")
print*, tokens ! ['hello', 'world']
! Perform another tokenization
tokens = tokenizer % tokenize("hello |world")
print*, tokens ! ['hello', 'world']
! Perform the tokenization, using a local behaviour
tokens = tokenizer % tokenize("hello| |world|token with spaces|", validate=local_behavior)
! Show the resultant tokens
print*, tokens ! ['hello', ' ', 'world', 'token with spaces']
! Bind global ignore behavior
tokenizer % ignore => ignore_tokens
! Perform the tokenization, using a local behaviour, but any token containg a space will ignored
tokens = tokenizer % tokenize("hello| |world|tokens with spaces|", validate=local_behavior)
! Show the resultant tokens
print*, tokens ! ['hello', 'world']
contains
logical function global_behavior(token) result(is_valid)
!! The token will be valid if it do not contain " " or "|"
character(*), intent(in) :: token
is_valid = .not. scan(token, "| ") > 0
end function
logical function local_behavior(token) result(is_valid)
!! The token will be valid if it do not contain "|"
character(*), intent(in) :: token
is_valid = .not. scan(token, "|") > 0
end function
logical function ignore_tokens(token) result(is_valid)
!! The token will be ignored if contains any spaces
character(*), intent(in) :: token
is_valid = scan(token, " ") > 0
end function
end program
```
## Contributing
### Getting the source
First get the code, by cloning the repo:
```sh
git clone https://github.com/14NGiestas/fortran-tokenizer.git
cd fortran-tokenizer
```
## Building with FPM
This project was designed to be built using the [Fortran Package Manager](https://github.com/fortran-lang/fpm).
Follow the directions on that page to install FPM if you haven't already.
To build and run the program (provided as a example), type:
```sh
fpm run
```
to run the tests, type
```sh
fpm test
```