Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wenkokke/dep2con
several algorithms for converting dependency structures into constituency structures.
https://github.com/wenkokke/dep2con
constituency-tree dependency-tree parse-trees tool
Last synced: about 2 months ago
JSON representation
several algorithms for converting dependency structures into constituency structures.
- Host: GitHub
- URL: https://github.com/wenkokke/dep2con
- Owner: wenkokke
- Created: 2014-09-08T18:30:23.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2022-02-07T16:25:00.000Z (almost 3 years ago)
- Last Synced: 2023-03-23T09:31:36.198Z (almost 2 years ago)
- Topics: constituency-tree, dependency-tree, parse-trees, tool
- Language: Haskell
- Homepage:
- Size: 38.1 KB
- Stars: 9
- Watchers: 1
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### Conversion from dependency structures to constituency structures
#### Words
First off, the algorithm uses trees of words and part-of-speech
tags. Part-of-speech tags are simple `String`s. Words are records with
three fields: a text[^1], a part of speech tag and a serial (a 1-index
into the sentence).> data Word =
> Word { text :: String
> , pos :: String
> , serial :: Integer
> }Words are parsed as follows:
Word := QuotedString '/' QuotedString '/' Integer
So, for instance, they are represented as:
"dog"/"NN"/2
#### Dependency Trees
Dependency trees---our input format---are represented as nodes with a
governor and a list of dependants. A leaf is represented by a `Node`
with no dependants.> data Tree =
> Node { governor :: Word
> , dependants :: [Tree]
> }Dependency trees are parsed as follows:
Node := Word | '(' Word Node* ')'
So the above word is still a valid tree, but so is the following:
("ROOT"/"ROOT"/0
("likes"/"VBZ"/4
("dog"/"NN"/2 "my"/"PRP"/1)
("also"/"RB"/3)
("eating"/"VBG"/5 "sausage"/"NN"/6)
)
)Which represents the following tree:
```tree
["ROOT" ["likes" ["dog" "my"] "also" ["eating" "sausage"]]]
```#### Constituency Trees
The main difference between constituency trees---our output
format---and dependency trees is that dependency trees store words at
every node, whereas in constituency trees only store words in the
leaves, and the nodes are marked with part-of-speech tags.```haskell
data Tree
= Leaf Word
| Node POS [Tree]
```The printing algorithm for constituency trees is very similar to the
one for parsing dependency trees, so the following is a valid
constituency tree (produced by our conversion algorithm).("ROOT"
("VP"
("NP"
("PRP" "my"/"PRP"/1)
("NN" "dog"/"NN"/2)
)
("RB" "also"/"RB"/3)
("VBZ" "likes"/"VBZ"/4)
("VP"
("VBG" "eating"/"VBG"/5)
("NN" "sausage"/"NN"/6)
)
)
)This string represents the following tree:
```tree
["ROOT" ["VP" ["NP" ["PRP" "my"] ["NN" "dog"] ] ["RB" "also"] ["VBZ" "likes"] ["VP" ["VBG" "eating"] ["NN" "sausage"] ] ] ]
```#### Conversion à la Collins
For the conversion algorithm we use the simple algorithm as proposed
by Collins et al., which tries to produce the simplest possible
constituency trees from dependency trees. The algorithm is as follows:- if we encounter a node *without* dependencies, we simply convert it
into a node bearing the part-of-speech tag and a leaf bearing the
governor;- if we encounter a node *with* dependencies, we do several things:
* we compute x, the part-of-speech tag of the governor;
* we compute xp, the phrasal projection of x (using `toXP`);
* we recursively apply the algorithm to the dependencies;
* we create a new node for x, using the current governor, and
insert it into the dependencies;
* lastly, we combine all of the above in a new node for xp.Here is the algorithm written out in Haskell:
```haskell
collins :: Dep.Tree -> Con.Tree
collins (Dep.Node gov []) = Con.Node (pos gov) [Con.Leaf gov]
collins (Dep.Node gov deps) = Con.Node xp (insert gov' deps')
where
x = pos gov :: POS
xp = toXP x :: POS
gov' = Con.Node x [Con.Leaf gov] :: Con.Tree
deps' = map collins deps :: [Con.Tree]
-- ^ apply `collins` to each dependency
```### References
Michael Collins, Jan Hajic, Lance Ramshaw, and Christoph
Tillmann. [A Statistical Parser for Czech][^Collins1999]. Proceedings
of ACL-1999, pages 505-512, 1999.Dan Klein and Christopher D. Manning. 2003. [Accurate Unlexicalized
Parsing][^Klein2003]. Proceedings of the 41st Meeting of the
Association for Computational Linguistics, pp. 423-430.Richard Socher, John Bauer, Christopher D. Manning and Andrew
Y. Ng. 2013. [Parsing With Compositional Vector Grammars][^Socher2013].
Proceedings of ACL 2013Fei Xia and Martha Palmer, 2001. [Converting Dependency Structures to
Phrase Structures][^Xia2001], Proceedings of the 1st Human Language
Technology Conference (HLT-2001), San Diego, Mar 18-21, 2001.[^Collins1999]: http://www.aclweb.org/anthology/P99-1065
[^Klein2003]: http://nlp.stanford.edu/~manning/papers/unlexicalized-parsing.pdf
[^Socher2013]: http://nlp.stanford.edu/pubs/SocherBauerManningNg_ACL2013.pdf
[^Xia2001]: http://www.aclweb.org/anthology-new/H/H01/H01-1014.pdf[^1]: While we refer to this field as the "text", which is how it
would be used whilst using `dep2con` standalone, when we
integrate it with `SemAnTE` we will use it to store ids.