https://github.com/unhammer/cg-mwesplit

DEPRECATED: now part of vislcg3. Splits constraint grammar-formatted multiwords created by hfst-tokenise
https://github.com/unhammer/cg-mwesplit

divvun vislcg3

Last synced: 4 months ago
JSON representation

DEPRECATED: now part of vislcg3. Splits constraint grammar-formatted multiwords created by hfst-tokenise

Host: GitHub
URL: https://github.com/unhammer/cg-mwesplit
Owner: unhammer
License: gpl-3.0
Created: 2016-04-14T10:30:31.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2016-05-20T10:38:45.000Z (about 9 years ago)
Last Synced: 2024-12-27T00:41:53.616Z (6 months ago)
Topics: divvun, vislcg3
Language: C++
Homepage: http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=2165
Size: 38.1 KB
Stars: 0
Watchers: 3
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README
- Changelog: ChangeLog
- License: COPYING
- Authors: AUTHORS

Awesome Lists containing this project

README

#+TITLE: cg-mwesplit
#+STARTUP: showall

#+CAPTION: Build Status
[[https://travis-ci.org/unhammer/cg-mwesplit][https://travis-ci.org/unhammer/cg-mwesplit.svg]]

* Description

This program reads input in Constraint Grammar format, and splits
special "multiword cohorts" into separate cohorts, leaving other
cohorts and intervening blanks as they were.

For examples of input/output, see the files in =test/=, e.g.
[[file:test/input.simple.cg][test/input.simple.cg]].

* Prerequisites
A C++ compiler that goes all the way to 11.

Tested with gcc-5.2.0, gcc-5.3.1 and clang-703.0.29.

(Should work all the way down to gcc-4.9, but will fail with e.g.
gcc-4.8.4 or clang-3.5.0.)

* Building

#+BEGIN_SRC sh
./autogen.sh
./configure # optionally with argument --prefix=$HOME/my/prefix
make
make install # with sudo if you didn't specify a prefix
#+END_SRC

* Usage

Takes no options, just stdin and stdout:
#+BEGIN_SRC sh
cg-mwesplit < infile > outfile
#+END_SRC

More typically, it'll be in a pipeline after =hfst-tokenise= and some
step that disambiguates multiwords using =vislcg3=:

#+BEGIN_SRC sh
echo words go here | hfst-tokenise --gtd tokeniser.pmhfst | vislcg3 -g mwe-dis.cg3 | cg-mwesplit
#+END_SRC

* Troubleshooting

If you get
: terminate called after throwing an instance of 'std::regex_error'
: what(): regex_error
then your C++ compiler is too old. See [[./README.org::*Prerequisites][Prerequisites]].

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/unhammer/cg-mwesplit

Awesome Lists containing this project

README