{"id":18754030,"url":"https://github.com/dcavar/fomamwt","last_synced_at":"2025-04-13T00:31:59.460Z","repository":{"id":150220988,"uuid":"143742188","full_name":"dcavar/fomaMWT","owner":"dcavar","description":"Foma-based multi-word tagger and morphological analyzer","archived":false,"fork":false,"pushed_at":"2018-08-06T15:15:17.000Z","size":629,"stargazers_count":7,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-26T18:57:26.345Z","etag":null,"topics":["cpp","finite-state-transducer","foma","multiword-expressions","multiword-extraction","natural-language-processing","nlp","nlp-parsing","xfst"],"latest_commit_sha":null,"homepage":"http://damir.cavar.me/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dcavar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-08-06T14:43:39.000Z","updated_at":"2024-01-21T13:52:51.000Z","dependencies_parsed_at":"2023-05-03T11:46:32.811Z","dependency_job_id":null,"html_url":"https://github.com/dcavar/fomaMWT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcavar%2FfomaMWT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcavar%2FfomaMWT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcavar%2FfomaMWT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcavar%2FfomaMWT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dcavar","download_url":"https://codeload.github.com/dcavar/fomaMWT/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248650590,"owners_count":21139670,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","finite-state-transducer","foma","multiword-expressions","multiword-extraction","natural-language-processing","nlp","nlp-parsing","xfst"],"created_at":"2024-11-07T17:27:53.322Z","updated_at":"2025-04-13T00:31:58.783Z","avatar_url":"https://github.com/dcavar.png","language":"C++","readme":"# Foma example codes\n\n(C) 2016-2018 by [Damir Cavar](http://damir.cavar.me/)\n\nLast edited: 2018-08-06, [Damir Cavar](http://damir.cavar.me/)\n\n\n## Intro\n\nThis code example shows how a [Foma](https://fomafst.github.io)-based FST can be used to process multi-word expressions that are given in a dictionary and compiled into a Finite State Transducer.\n\nThere is a default window size specified in the code. It can be altered using command line arguments.\n\nThe maximum multi-word window size can actually be compiled into the Finite State Transducer and read out using the C-wrapper. This way one can avoid unnecessary lookups. The advantage of this method, assuming that one has a comprehensive list of multi-word expressions to compile into the transducer, is that it is very fast and that it shows internal structural and morphosyntactic properties of multi-word expressions.\n\nThe example implementation should be straight forward to understand. It expects an input in form of a file (or a stream) that contains a tokenized sentence per line. See the included *test.txt* file for an example.\n\n\n## Build from Code\n\nTo compile this example, you need to have the entire Foma collection of binaries, includes and libraries set up on your system. You will also need some C++11 compiler and various other libraries for it, for example the [Boost](https://www.boost.org) libraries.\n\nThe project is a [CMake](https://cmake.org) project. Make sure that you have also [CMake](https://cmake.org) installed and set up on your system.\n\nTo create the running binary for the code in *FomaMWT*, in the folder run:\n\n\tcmake CMakeList.txt\n\nThis will generate the *Makefile* and other files in the same folder. Run:\n\n\tmake\n\nand it should compile correctly, if all the paths and folders are OK, and if the libraries were found.\n\n\nIf you want to test the speed of the processor, run the following command:\n\n\ttime ./mwtagger test.txt \u003e res.txt\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdcavar%2Ffomamwt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdcavar%2Ffomamwt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdcavar%2Ffomamwt/lists"}