https://github.com/dcavar/fomatestcpp

Foma-based morphological analysis using a simple C++ wrapper
https://github.com/dcavar/fomatestcpp

cpp finite-state-transducer foma lexicon morphology natural-language-processing nlp nlp-parsing

Last synced: about 1 year ago
JSON representation

Foma-based morphological analysis using a simple C++ wrapper

Host: GitHub
URL: https://github.com/dcavar/fomatestcpp
Owner: dcavar
License: apache-2.0
Created: 2018-08-06T14:38:48.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2018-08-06T15:18:52.000Z (almost 8 years ago)
Last Synced: 2025-05-20T12:53:49.676Z (about 1 year ago)
Topics: cpp, finite-state-transducer, foma, lexicon, morphology, natural-language-processing, nlp, nlp-parsing
Language: C++
Homepage: http://damir.cavar.me/
Size: 98.6 KB
Stars: 2
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Foma example codes

Last edited: 2018-08-06, Damir Cavar

## Includes and Libraries

You will need [Foma](https://fomafst.github.io) and all its include and library files on your system to be able to compile this test code.

Included is a simplified and reduced English morphology compiled into a Finite State Transducer for the use with Foma.

## Build the binary

To compile this example, you need to have the entire Foma collection of binaries, includes and libraries set up on your system. You will also need some C++11 compiler and various other libraries for it, for example the [Boost](https://www.boost.org) libraries.

The project is a [CMake](https://cmake.org) project. Make sure that you have also [CMake](https://cmake.org) installed and set up on your system.

To create the running binary for the code in *FomaMWT*, in the folder run:

cmake CMakeList.txt

This will generate the *Makefile* and other files in the same folder. Run:

make

and it should compile correctly, if all the paths and folders are OK, and if the libraries were found.

If you want to test the speed of the processor, run the following command:

time ./fomatest test.txt > res.txt

Create a larger list of words in a text file and run it through the test tool. On an Intel i7 CPU with Fedora Linux I achieve something in the range of 300,000 tokens per second, with average number of ambiguous morphological analyses for each string.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dcavar/fomatestcpp

Awesome Lists containing this project

README