Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/dmcc/PyStanfordDependencies

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies
https://github.com/dmcc/PyStanfordDependencies

Last synced: 23 days ago
JSON representation

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

Host: GitHub
URL: https://github.com/dmcc/PyStanfordDependencies
Owner: dmcc
Created: 2014-12-13T18:38:15.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2019-03-15T18:28:14.000Z (about 5 years ago)
Last Synced: 2024-04-05T23:21:20.748Z (about 2 months ago)
Language: Python
Homepage: https://pypi.python.org/pypi/PyStanfordDependencies
Size: 135 KB
Stars: 68
Watchers: 4
Forks: 16
Open Issues: 8
Metadata Files:
- Readme: README.rst
- Contributing: CONTRIBUTING.rst
- License: LICENSE-2.0.txt

Lists

AI - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learning - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learning - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learning - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learning - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
https-github.com-keon-awesome-nlp - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Packages / Libraries)
awesome-machine-learning - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learnings - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learning-library - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learning - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learning - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learning - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learning - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learning - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learning - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learning - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learning - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-advanced-metering-infrastructure - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learning - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-master - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learning - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learning - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)
awesome-machine-learning - PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies. (Python / General-Purpose Machine Learning)

README

        PyStanfordDependencies

======================

.. image:: https://travis-ci.org/dmcc/PyStanfordDependencies.svg?branch=master

    :target: https://travis-ci.org/dmcc/PyStanfordDependencies

.. image:: https://badge.fury.io/py/PyStanfordDependencies.png

   :target: https://badge.fury.io/py/PyStanfordDependencies

.. image:: https://coveralls.io/repos/dmcc/PyStanfordDependencies/badge.png?branch=master

   :target: https://coveralls.io/r/dmcc/PyStanfordDependencies?branch=master

Python interface for converting `Penn Treebank

`_ trees to `Universal

Dependencies `_

and `Stanford Dependencies

`_.

Example usage

-------------

Start by getting a ``StanfordDependencies`` instance with

``StanfordDependencies.get_instance()``::

    >>> import StanfordDependencies

    >>> sd = StanfordDependencies.get_instance(backend='subprocess')

``get_instance()`` takes several options. ``backend`` can currently

be ``subprocess`` or ``jpype`` (see below). If you have an existing

`Stanford CoreNLP `_ or

`Stanford Parser `_

jar file, use the ``jar_filename`` parameter to point to the full path of

the jar file. Otherwise, PyStanfordDependencies will download a jar file

for you and store it in locally (``~/.local/share/pystanforddeps``). You

can request a specific version with the ``version`` flag, e.g.,

``version='3.4.1'``. To convert trees, use the ``convert_trees()`` or

``convert_tree()`` method (note that by default, ``convert_trees()`` can

be considerably faster if you're doing batch conversion). These return

a sentence (list of ``Token`` objects) or a list of sentences (list of

list of ``Token`` objects) respectively::

    >>> sent = sd.convert_tree('(S1 (NP (DT some) (JJ blue) (NN moose)))')

    >>> for token in sent:

    ...     print token

    ...

    Token(index=1, form='some', cpos='DT', pos='DT', head=3, deprel='det')

    Token(index=2, form='blue', cpos='JJ', pos='JJ', head=3, deprel='amod')

    Token(index=3, form='moose', cpos='NN', pos='NN', head=0, deprel='root')

This tells you that ``moose`` is the head of the sentence and is

modified by ``some`` (with a ``det`` = determiner relation) and ``blue``

(with an ``amod`` = adjective modifier relation). Fields on ``Token``

objects are readable as attributes. See docs for additional options in

``convert_tree()`` and ``convert_trees()``.

Visualization

-------------

If you have the `asciitree `_

package, you can use a prettier ASCII formatter::

    >>> print sent.as_asciitree()

     moose [root]

      +-- some [det]

      +-- blue [amod]

If you have Python 2.7 or later, you can use `Graphviz

`_ to render your graphs. You'll need the `Python

graphviz `_ package to call

``as_dotgraph()``::

    >>> dotgraph = sent.as_dotgraph()

    >>> print dotgraph

    digraph {

            0 [label=root]

            1 [label=some]

                    3 -> 1 [label=det]

            2 [label=blue]

                    3 -> 2 [label=amod]

            3 [label=moose]

                    0 -> 3 [label=root]

    }

    >>> dotgraph.render('moose') # renders a PDF by default

    'moose.pdf'

    >>> dotgraph.format = 'svg'

    >>> dotgraph.render('moose')

    'moose.svg'

The Python `xdot `_

package provides an interactive visualization::

    >>> import xdot

    >>> window = xdot.DotWindow()

    >>> window.set_dotcode(dotgraph.source)

Both ``as_asciitree()`` and ``as_dotgraph()`` allow customization.

See the docs for additional options.

Backends

--------

Currently PyStanfordDependencies includes two backends:

- ``subprocess`` (works anywhere with a ``java`` binary, but more

  overhead so batched conversions with ``convert_trees()`` are

  recommended)

- ``jpype`` (requires `jpype1 `_,

  faster than the subprocess backend, also includes access to the Stanford

  CoreNLP lemmatizer)

By default, PyStanfordDependencies will attempt to use the ``jpype``

backend. If ``jpype`` isn't available or crashes on startup,

PyStanfordDependencies will fallback to ``subprocess`` with a warning.

Universal Dependencies status

-----------------------------

PyStanfordDependencies supports most features in `Universal Dependencies

`_ (see `issue #10

`_ for the

most up to date status). PyStanfordDependencies output matches Universal

Dependencies in terms of structure and dependency labels, but Universal

POS tags and features are missing. Currently, PyStanfordDependencies will

output Universal Dependencies by default (unless you're using Stanford

CoreNLP 3.5.1 or earlier).

Related projects

----------------

- `clearnlp-converter `_

  (uses `clearnlp `_ instead of `Stanford

  CoreNLP `_ for

  dependency conversion)

More information

----------------

Licensed under `Apache 2.0 `_.

Written by David McClosky (`homepage

`_, `code `_)

Bug reports and feature requests: `GitHub issue tracker

`_

Release summaries

-----------------

- 0.3.1 (2015.11.02): Better collapsed universal handling, bugfixes

- 0.3.0 (2015.10.09): Support copy nodes, more input checking/debugging

  help, example ``convert.py`` program

- 0.2.0 (2015.08.02): Universal Dependencies support (mostly),

  Python 3 support (fully), minor API updates

- 0.1.7 (2015.06.13): Bugfixes for ``JPype``, handle version mismatches

  in IBM Java

- 0.1.6 (2015.02.12): Support for ``graphviz`` formatting, CoreNLP 3.5.1,

  better Windows portability

- 0.1.5 (2015.01.10): Support for ASCII tree formatting

- 0.1.4 (2015.01.07): Fix ``CCprocessed`` support

- 0.1.3 (2015.01.03): Bugfixes, coveralls integration, refactoring

- 0.1.2 (2015.01.02): Better CoNLL structures, test suite and Travis CI

  support, bugfixes

- 0.1.1 (2014.12.15): More docs, fewer bugs

- 0.1 (2014.12.14): Initial release