An open API service indexing awesome lists of open source software.

https://github.com/davidar/xwn

eXtended WordNet
https://github.com/davidar/xwn

Last synced: 5 months ago
JSON representation

eXtended WordNet

Awesome Lists containing this project

README

          

XWN 2.0-1.1

SUMMARY
=======

1. LICENSE

2. What XWN 2.0-1 brings new
2.a What is different XWN 2.0-1.1 than XWN 2.0-1 release

3. XWN 2.0-1.1 files

4. How the information related to a synset is represented in XWN 2.0-1.1

5. Contact

====================================================================================

1. LICENSE
==========

XWN 2.0-1.1 software and database is being provided to you, the LICENSEE,
by the University of Texas at Dallas under the following license.
By obtaining, using and/or copying this software and database,
you agree that you have read, understood, and will comply with
these terms and conditions:

Permission to use, copy, modify and distribute this software and database
and its documentation for any purpose and without fee or royalty
is hereby granted, provided that you agree to comply with the following
copyright notice and statements, including the disclaimer, and that
the same appear on ALL copies of the software, database and documentation,
including modifications that you make for internal use or for distribution.

eXtended WordNet 2.0-1.1 for WordNet 2.0 Copyright © 2003 by the University of Texas
at Dallas. All rights reserved.(WordNet is copyrighted by the Princeton University)

THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND THE UNIVERSITY OF TEXAS
AT DALLAS MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED.
BY WAY OF EXAMPLE, BUT NOT LIMITATION, UNIVERSITY OF TEXAS AT DALLAS
MAKES NO REPRESENTATIONS OR WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE LICENSED SOFTWARE,
DATABASE OR DOCUMENTATION WILL NOT INFRINGE ANY THIRD PARTY PATENTS,
COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.

The name of University of Texas at Dallas may not be used in advertising
or publicity pertaining to distribution of the software and/or database.
Title to copyright in this software, database and any associated documentation
shall at all times remain with University of Texas at Dallas and LICENSEE
agrees to preserve same.

====================================================================================

2. What XWN 2.0-1 brings new
============================

The XWN 2.0-1 release is based on WordNet 2.0

Also, another difference with respect to XWN 1.7 is the alignment of the information at synset level.
In XWN 1.7 the information was not aligned. There were three different files for WSD, parse trees and LFTs.
In XWN 2.0-1 these three files are merged into a single file for each part of speech WordNet offers (noun, verb, adjective, adverb).

====================================================================================

2.a What is different in XWN 2.0-1.1 then XWN 2.0-1 release

Some bugs, regarding WSD tokens with lemma different than word form,
are corrected.

====================================================================================

3. XWN 2.0-1 files
==================

The XWN 2.0-1.1 release has 6 files:
- this README.txt file;
- xwn.xsd file, which contains the XSD schema for the remaining files;
- adj.xml file, which contains the adjectives, in an XML format;
- adv.xml file, which contains the adverbs, in an XML format;
- noun.xml file, which contains the nouns, in an XML format;
- verb.xml file, which contains the verbs, in an XML format.

====================================================================================

4. How the information related to a synset is represented in XWN 2.0-1.1
======================================================================

Here is an example extracted from adv.xml:

<---START EXAMPLE--->

barely, hardly, just, scarcely, scarce

by a small margin; "they could barely hear the speaker"; "we hardly knew them"; "just missed being hit"; "had scarcely rung the bell when the door flew open"; "would have scarce arrived before she would have found some excuse to leave"- W.B.Yeats


by
a
small
margin

(TOP (S (ADVP (RB barely) )
(VP (VBZ is)
(PP (IN by)
(NP (DT a) (JJ small) (NN margin) ) ) )
(. .) ) )


barely:RB(e1) -> by:IN(e1, x1) small:JJ(x1) margin:NN(x1)

<---END EXAMPLE--->

Let's explain what's going on here:

For each synset in WordNet 2.0, we generated a ... datagram.
The "gloss" tag has as attributes "pos" (the part of speech; we have an adverb here) and "synsetID" (the ID of the synset under discussion) which uniquely identify a certain synset.

Inside ... we have another set of tags:
- synonymSet - the set of synonyms of that synset;
- text - the gloss of that synset (definitions + examples (examples are quoted));
- wsd - the word sense disambiguation of gloss' definitions;
- parse - the parse trees of gloss' definitions;
- lft - the logic form transformations of gloss' definitions.

There is a "quality" attribute related to each parse tree, logic form transformation and open class word. Its values can be:
- GOLD - a human decided over that information;
- SILVER - there was agreement in an automatic voting process, but no human intervention took place;
- NORMAL - output provided by the software without human checking or voting agreement.

====================================================================================

5. Contact
==========

Your feedback is important to us.

Please e-mail us at: xwn@hlt.utdallas.edu

Human Language Technology Research Institute
University of Texas at Dallas
December 19th, 2003