https://github.com/idiap/discoconn-classifier

Classifier models and feature extractors for discourse relations
https://github.com/idiap/discoconn-classifier

Last synced: about 1 year ago
JSON representation

Classifier models and feature extractors for discourse relations

Host: GitHub
URL: https://github.com/idiap/discoconn-classifier
Owner: idiap
License: gpl-3.0
Created: 2013-08-28T09:53:32.000Z (almost 13 years ago)
Default Branch: master
Last Pushed: 2013-11-05T08:10:49.000Z (over 12 years ago)
Last Synced: 2025-02-16T11:11:34.418Z (over 1 year ago)
Language: Perl
Homepage:
Size: 1.22 MB
Stars: 4
Watchers: 5
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: readme.txt
- License: LICENSE.txt

Awesome Lists containing this project

README

          DiscoConn-Classifiers

=====================

Copyright (c) 2013 Idiap Research Institute, http://www.idiap.ch/

Written by Thomas Meyer, Thomas.Meyer (at) idiap.ch , ithurtstom (a) gmail.com

See LICENSE.txt for the GPL v3 license text under which this software is released.

This package consists of the following:

1. Classifier models in order to tag instances of 7 discourse connectives according to the discourse relation they signal in raw and unseen English text

2. A feature extraction script in order to generate test instances and feature vectors for the connectives to disambiguate

See the sections below for instructions on how to run the scripts.

If you make use of this software, please consider citing the following papers:

@INPROCEEDINGS{Meyer-HyTra-2012,

  author = {Meyer, Thomas and Popescu-Belis, Andrei},

  title = {{Using Sense-labeled Discourse Connectives for Statistical Machine

	Translation}},

  booktitle = {Proceedings of the EACL 2012 Joint Workshop on Exploiting Synergies

	between IR and MT, and Hybrid Approaches to MT (ESIRMT-HyTra)},

  year = {2012},

  pages = {129--138},

  address = {Avignon, FR}

}

@INPROCEEDINGS{Meyer-AMTA-2012,

  author = {Meyer, Thomas and Popescu-Belis, Andrei and Hajlaoui, Najeh and Gesmundo,

	Andrea},

  title = {{Machine Translation of Labeled Discourse Connectives}},

  booktitle = {Proceedings of the Tenth Biennial Conference of the Association for

	Machine Translation in the Americas (AMTA)},

  year = {2012},

  address = {San Diego, CA}

}

--------------------------------------------------

Instructions: Disambiguating Discourse Connectives

--------------------------------------------------

Dependencies:

Install WordNet (http://wordnet.princeton.edu/) and set the environment variable WNHOME to its directory

Install the perl module WordNet::QueryData from cpan: http://search.cpan.org/~jrennie/WordNet-QueryData-1.49/QueryData.pm

You can point to it from the parsedUnseenExtractor.pl script in line 53.

Install the Stanford classifier (http://nlp.stanford.edu/software/classifier.shtml)

Procedure:

1. Prepare a raw UTF-8 text file of your English text in which you want classify the connectives

2. With the script extract_connectives.pl, you can obtain sentences with connectives only, by executing:

./extract_connectives.pl textfile.txt (although|however|meanwhile|since|though|while|yet)

by choosing only one connective at a time.

3. Parse these extracted sentences with:

a) a constituency parser (e.g. https://github.com/BLLIP/bllip-parser), with bracketed tree output (a la PTB)

b) a TimeML parser (http://www.timeml.org/site/tarsqi/toolkit/)

c) a dependency parser (e.g. https://github.com/agesmundo/IDParser), with output in CONLL format

and put the parsed files into corresponding directories.

4. Point to these directories in the code of the script parsedUnseenExtractor.pl and execute:

./parsedUnseenExtractor.pl (although|however|meanwhile|since|though|while|yet) directory/

Note that this can take time for a larger set of sentences, as a lot of queries to WordNet are needed.

5. On the test set output, you can now run the classifier models (which are in the subdirectory 'models' of this package):

./java -Xms1g -Xmx3g -jar /path/to/classifier/stanford-classifier.jar -props /path/to//models/(although|although|however|meanwhile|since|though|while|yet).prop

In the prop-files, change the paths to the models and to the test sets.

The classifier outputs a file classifier_answers.txt with the predicted discourse relations and probabilities.

The possible relations for the connectives are:

although (contrast|concession)

however (contrast|concession)

meanwhile (contrast|temporal)

since (causal|temporal|temporal-causal)

though (contrast|concession)

while (contrast|concession|temporal|temporal-contrast|temporal-causal)

yet (adv|contrast|concession)

For an explanation and an example of the 36 features extracted, please see 'feature_list.txt'.

The format is: feature name TAB example value

If you would like to retrain your own models, the manual gold annotation in Europarl text can be obtained from https://www.idiap.ch/dataset/Disco-Annotation

Please contact Thomas.Meyer (at) idiap.ch or ithurstom (a) gmail.com for any questions.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/idiap/discoconn-classifier

Awesome Lists containing this project

README