An open API service indexing awesome lists of open source software.

https://github.com/languagemachines/tadpole

The good old predecessor of Frog
https://github.com/languagemachines/tadpole

Last synced: 14 days ago
JSON representation

The good old predecessor of Frog

Awesome Lists containing this project

README

          

Tadpole 0.6

A Tagger-Lemmatizer-Morphological-Analyzer-Dependency-Parser for Dutch
Version 0.6
http://ilk.uvt.nl/tadpole

Copyright 2006-2010 Bertjan Busser, Antal van den Bosch, and Ko
van der Sloot
ILK Research Group, Faculty of Humanities, Tilburg University
http://ilk.uvt.nl

Tadpole is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.

Tadpole is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see .

For more information and updates, see:
http://ilk.uvt.nl/tadpole

---------------------------------------------------------------------
Installation and Quick Start

Tadpole relies on Timbl version 6.3, TimblServer version 1.0, and Mbt
version 3.2. TimblServer relies on Timbl; Mbt relies on Timbl and
TimblServer. The logical order of installation is therefore (1) Timbl,
(2) TimblServer, (3) Mbt, and (4) Tadpole. Tadpole will NOT work with
previous versions of Timbl and Mbt. The three software packages can be
downloaded from

http://ilk.uvt.nl/timbl (Timbl and TimblServer)
http://ilk.uvt.nl/mbt (Mbt)

Please consult the installation instructions with these packages.

Tadpole also relies on Python 2.5 or higher, libboost 1.33 or higher,
and ICU 3.6 or higher. Please consult your system maintainer if you
cannot install these packages yourself.

When you have downloaded the Tadpole tarball from
http://ilk.uvt.nl/downloads/pub/software/tadpole , you can untar the
package, and go to the Tadpole directory. If you installed Timbl,
TimblSever and Mbt in the same install directory (i.e., you specified
the same install directory with "--prefix=" in all three
package installations), it is sufficient to the same with Tadpole.

%prompt$> tar zxvf tadpole-0.6.tar.gz
%prompt$> cd tadpole-0.6
%prompt$> ./configure --prefix=
%prompt$> make && make install

Invoking the Tadpole binary without arguments prints a basic usage:

%prompt$> ./Tadpole
Tadpole v.0.6
Options:
-d path to config dir (default ./config)
-T (uses Mbt-style settings file)
-M (morphological analysis)
accepts:
t
m
-L (lemmatizer)
accepts:
p (for filenames)
O
-U (multiwordchunker)
accepts:
t
c (char between tokens in a mwu)
-P
accepts:
to do...
-t
--testdir= (all files in this dir will be processed
-o (default stdout)
--outputdir= (default stdout)
-s (default tab)
-S (run as server instead of reading from testfile)
-K : keep intermediate files, (last sentence only) (default false)
-d (for more verbosity)
--skip= Allows to skip certain Tadpole components.
Especially the dependency parser is resource intensive and may
want to be skipped when not required. Components are indicated by
one character, multiple may be combined:
t - tokeniser, p - parser, m - morphological analyser

The following command line is an example run of Tadpole on the provided
sample text file test.txt

%prompt%> ./Tadpole -t test.txt

This should produce output (to stdout) like this:

1 De de [de] LID(bep,stan,rest) 2 det
2 oprichter oprichter [op][richt][er] N(soort,ev,basis,zijd,stan) 8 su
3 van van [van] VZ(init) 2 mod
4 Wikipedia Wikipedia [Wikipedia] SPEC(deeleigen) 3 obj1
5 , , [,] LET() 4 punct
6 Jimmy_Wales Jimmy_Wales [Jimmy]_[Wales] SPEC(deeleigen) 2 app
7 , , [,] LET() 6 punct
8 wil willen [wil] WW(pv,tgw,ev) 0 ROOT
9 een een [een] LID(onbep,stan,agr) 11 det
10 nieuwe nieuw [nieuw][e] ADJ(prenom,basis,met-e,stan) 11 mod
11 zoekmachine zoekmachine [zoek][machine] N(soort,ev,basis,zijd,stan) 8 su
12 lanceren lanceren [lanceren] WW(inf,vrij,zonder) 8 vc
13 . . [.] LET() 12 punct

The first column is a token counter; the second column is the token
itself, followed by its lemma and its morphological analysis. The
fifth column is the CGN POS tag. The sixth column points to the
token counter of the head token of the line's token in the dependency
graph; the seventh column contains the type of dependency relation
between the two tokens.

---------------------------------------------------------------------
Credits

Many thanks go out to the people who made the developments of the
Tadpole components possible: Walter Daelemans, Jakub Zavrel, Ko van
der Sloot, Sabine Buchholz, Sander Canisius, Gert Durieux, and Peter
Berck.

Thanks to Erik Tjong Kim Sang and Lieve Macken for stress-testing the
first versions of Tadpole, and to Rogier Kraf, Guy De Pauw, Joost
Hengstmengel, Frederik Vaassen, Wouter van Atteveldt, Joseph Turian,
Barbara Plank, Jan-Pieter Kunst, Robert Hensing, Theo van den Heuvel,
and Martha van den Hoven for valuable bug reports, comments, and
suggestions for improvements.

---------------------------------------------------------------------
References

Tadpole is described in the following paper:

Van den Bosch, A., Busser, G.J., Daelemans, W., and Canisius, S. (to
appear). An efficient memory-based morphosyntactic tagger and parser for
Dutch, To appear in Selected Papers of the 17th Computational Linguistics in
the Netherlands Meeting, Leuven, Belgium.

We kindly ask you to refer to this paper if you make use of Tadpole in
your own work.

You can find more information on components of Tadpole in these papers,
which can be downloaded from http://ilk.uvt.nl/publications :

Daelemans, W., Zavrel, J, Berck, P, and Gillis, S. (1996). MBT: A
Memory-Based Part of Speech Tagger-Generator. In: E. Ejerhed and I. Dagan
(eds.) Proceedings of the Fourth Workshop on Very Large Corpora, Copenhagen,
Denmark, pp. 14-27.

Van den Bosch, A., Daelemans, W., and Weijters, A. (1996). Morphological
analysis as classification: An inductive-learning approach. In Proceedings
of NeMLaP-2, Bilkent University, Turkey, 79-89.

Van den Bosch, A., and Daelemans, W. (1999). Memory-based morphological
analysis. In Proceedings of the 37th Annual Meeting of the Association for
Computational Linguistics, ACL'99, University of Maryland, USA, June 20-26,
1999, pp. 285-292.

Zavrel, J., and Daelemans W. (1999). Recent Advances in Memory-Based
Part-of-Speech Tagging. In: Actas del VI Simposio Internacional de
Comunicacion Social, Santiago de Cuba, pp. 590-597.