{"id":18439490,"url":"https://github.com/idiap/discoconn-classifier","last_synced_at":"2025-04-14T14:27:27.964Z","repository":{"id":72443074,"uuid":"12430295","full_name":"idiap/DiscoConn-Classifier","owner":"idiap","description":"Classifier models and feature extractors for discourse relations","archived":false,"fork":false,"pushed_at":"2013-11-05T08:10:49.000Z","size":1280,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-02-16T11:11:34.418Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Perl","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/idiap.png","metadata":{"files":{"readme":"readme.txt","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-08-28T09:53:32.000Z","updated_at":"2018-03-14T07:47:22.000Z","dependencies_parsed_at":null,"dependency_job_id":"775cd5dd-b1a0-4f51-87da-9567b7526018","html_url":"https://github.com/idiap/DiscoConn-Classifier","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2FDiscoConn-Classifier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2FDiscoConn-Classifier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2FDiscoConn-Classifier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/idiap%2FDiscoConn-Classifier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/idiap","download_url":"https://codeload.github.com/idiap/DiscoConn-Classifier/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248896217,"owners_count":21179371,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T06:24:57.904Z","updated_at":"2025-04-14T14:27:27.923Z","avatar_url":"https://github.com/idiap.png","language":"Perl","funding_links":[],"categories":[],"sub_categories":[],"readme":"DiscoConn-Classifiers\n=====================\n\nCopyright (c) 2013 Idiap Research Institute, http://www.idiap.ch/\nWritten by Thomas Meyer, Thomas.Meyer (at) idiap.ch , ithurtstom (a) gmail.com\nSee LICENSE.txt for the GPL v3 license text under which this software is released.\n\nThis package consists of the following:\n\n1. Classifier models in order to tag instances of 7 discourse connectives according to the discourse relation they signal in raw and unseen English text\n2. A feature extraction script in order to generate test instances and feature vectors for the connectives to disambiguate\n\nSee the sections below for instructions on how to run the scripts.\n\nIf you make use of this software, please consider citing the following papers:\n\n@INPROCEEDINGS{Meyer-HyTra-2012,\n  author = {Meyer, Thomas and Popescu-Belis, Andrei},\n  title = {{Using Sense-labeled Discourse Connectives for Statistical Machine\n\tTranslation}},\n  booktitle = {Proceedings of the EACL 2012 Joint Workshop on Exploiting Synergies\n\tbetween IR and MT, and Hybrid Approaches to MT (ESIRMT-HyTra)},\n  year = {2012},\n  pages = {129--138},\n  address = {Avignon, FR}\n}\n\n@INPROCEEDINGS{Meyer-AMTA-2012,\n  author = {Meyer, Thomas and Popescu-Belis, Andrei and Hajlaoui, Najeh and Gesmundo,\n\tAndrea},\n  title = {{Machine Translation of Labeled Discourse Connectives}},\n  booktitle = {Proceedings of the Tenth Biennial Conference of the Association for\n\tMachine Translation in the Americas (AMTA)},\n  year = {2012},\n  address = {San Diego, CA}\n}\n\n--------------------------------------------------\nInstructions: Disambiguating Discourse Connectives\n--------------------------------------------------\n\nDependencies:\n\nInstall WordNet (http://wordnet.princeton.edu/) and set the environment variable WNHOME to its directory\nInstall the perl module WordNet::QueryData from cpan: http://search.cpan.org/~jrennie/WordNet-QueryData-1.49/QueryData.pm\nYou can point to it from the parsedUnseenExtractor.pl script in line 53.\nInstall the Stanford classifier (http://nlp.stanford.edu/software/classifier.shtml)\n\nProcedure:\n\n1. Prepare a raw UTF-8 text file of your English text in which you want classify the connectives\n\n2. With the script extract_connectives.pl, you can obtain sentences with connectives only, by executing:\n\n./extract_connectives.pl textfile.txt (although|however|meanwhile|since|though|while|yet)\n\nby choosing only one connective at a time.\n\n3. Parse these extracted sentences with:\n\na) a constituency parser (e.g. https://github.com/BLLIP/bllip-parser), with bracketed tree output (a la PTB)\nb) a TimeML parser (http://www.timeml.org/site/tarsqi/toolkit/)\nc) a dependency parser (e.g. https://github.com/agesmundo/IDParser), with output in CONLL format\n\nand put the parsed files into corresponding directories.\n\n4. Point to these directories in the code of the script parsedUnseenExtractor.pl and execute:\n\n./parsedUnseenExtractor.pl (although|however|meanwhile|since|though|while|yet) directory/\n\nNote that this can take time for a larger set of sentences, as a lot of queries to WordNet are needed.\n\n5. On the test set output, you can now run the classifier models (which are in the subdirectory 'models' of this package):\n\n./java -Xms1g -Xmx3g -jar /path/to/classifier/stanford-classifier.jar -props /path/to//models/(although|although|however|meanwhile|since|though|while|yet).prop\n\nIn the prop-files, change the paths to the models and to the test sets.\nThe classifier outputs a file classifier_answers.txt with the predicted discourse relations and probabilities.\nThe possible relations for the connectives are:\n\nalthough (contrast|concession)\nhowever (contrast|concession)\nmeanwhile (contrast|temporal)\nsince (causal|temporal|temporal-causal)\nthough (contrast|concession)\nwhile (contrast|concession|temporal|temporal-contrast|temporal-causal)\nyet (adv|contrast|concession)\n\nFor an explanation and an example of the 36 features extracted, please see 'feature_list.txt'.\nThe format is: feature name TAB example value\n\nIf you would like to retrain your own models, the manual gold annotation in Europarl text can be obtained from https://www.idiap.ch/dataset/Disco-Annotation\n\nPlease contact Thomas.Meyer (at) idiap.ch or ithurstom (a) gmail.com for any questions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidiap%2Fdiscoconn-classifier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fidiap%2Fdiscoconn-classifier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidiap%2Fdiscoconn-classifier/lists"}