https://github.com/unhammer/cg-mwesplit
DEPRECATED: now part of vislcg3. Splits constraint grammar-formatted multiwords created by hfst-tokenise
https://github.com/unhammer/cg-mwesplit
divvun vislcg3
Last synced: 4 months ago
JSON representation
DEPRECATED: now part of vislcg3. Splits constraint grammar-formatted multiwords created by hfst-tokenise
- Host: GitHub
- URL: https://github.com/unhammer/cg-mwesplit
- Owner: unhammer
- License: gpl-3.0
- Created: 2016-04-14T10:30:31.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2016-05-20T10:38:45.000Z (about 9 years ago)
- Last Synced: 2024-12-27T00:41:53.616Z (6 months ago)
- Topics: divvun, vislcg3
- Language: C++
- Homepage: http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=2165
- Size: 38.1 KB
- Stars: 0
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README
- Changelog: ChangeLog
- License: COPYING
- Authors: AUTHORS
Awesome Lists containing this project
README
#+TITLE: cg-mwesplit
#+STARTUP: showall#+CAPTION: Build Status
[[https://travis-ci.org/unhammer/cg-mwesplit][https://travis-ci.org/unhammer/cg-mwesplit.svg]]* Description
This program reads input in Constraint Grammar format, and splits
special "multiword cohorts" into separate cohorts, leaving other
cohorts and intervening blanks as they were.For examples of input/output, see the files in =test/=, e.g.
[[file:test/input.simple.cg][test/input.simple.cg]].* Prerequisites
A C++ compiler that goes all the way to 11.Tested with gcc-5.2.0, gcc-5.3.1 and clang-703.0.29.
(Should work all the way down to gcc-4.9, but will fail with e.g.
gcc-4.8.4 or clang-3.5.0.)* Building
#+BEGIN_SRC sh
./autogen.sh
./configure # optionally with argument --prefix=$HOME/my/prefix
make
make install # with sudo if you didn't specify a prefix
#+END_SRC* Usage
Takes no options, just stdin and stdout:
#+BEGIN_SRC sh
cg-mwesplit < infile > outfile
#+END_SRCMore typically, it'll be in a pipeline after =hfst-tokenise= and some
step that disambiguates multiwords using =vislcg3=:#+BEGIN_SRC sh
echo words go here | hfst-tokenise --gtd tokeniser.pmhfst | vislcg3 -g mwe-dis.cg3 | cg-mwesplit
#+END_SRC* Troubleshooting
If you get
: terminate called after throwing an instance of 'std::regex_error'
: what(): regex_error
then your C++ compiler is too old. See [[./README.org::*Prerequisites][Prerequisites]].