https://github.com/proycon/python-frog
Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)
https://github.com/proycon/python-frog
Last synced: 8 months ago
JSON representation
Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)
- Host: GitHub
- URL: https://github.com/proycon/python-frog
- Owner: proycon
- License: gpl-3.0
- Created: 2014-09-07T20:32:31.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2025-03-20T16:22:42.000Z (8 months ago)
- Last Synced: 2025-04-04T11:07:16.697Z (8 months ago)
- Language: Cython
- Size: 122 KB
- Stars: 49
- Watchers: 5
- Forks: 10
- Open Issues: 6
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
- awesome-machine-master - python-frog - Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER) (Python / General-Purpose Machine Learning)
- awesome-machine-learning - python-frog - Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER) (Python / General-Purpose Machine Learning)
- awesome-machine-learning - python-frog - Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER) (Python / General-Purpose Machine Learning)
- awesome-machine-learning - python-frog - Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER) (Python / General-Purpose Machine Learning)
- awesome-nlp - python-frog - Python 綁定到 Frog,一個荷蘭語的自然語言處理套件。(pos 標記,詞形還原,依賴解析,NER (其他語言 / 函式庫與嵌入)
- fucking-awesome-machine-learning - python-frog - Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER) (Python / General-Purpose Machine Learning)
- awesome-machine-learning - python-frog - Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER) (Python / General-Purpose Machine Learning)
- awesome-nlp - python-frog - Python 綁定到 Frog,一個荷蘭語的自然語言處理套件。(pos 標記,詞形還原,依賴解析,NER (其他語言 / 函式庫與嵌入)
- awesome-machine-learning - python-frog - Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER) (Python / General-Purpose Machine Learning)
- awesome-machine-learning - python-frog - Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER) (Python / General-Purpose Machine Learning)
- awesome-advanced-metering-infrastructure - python-frog - Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER) (Python / General-Purpose Machine Learning)
- https-github.com-keon-awesome-nlp - python-frog - Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER) (Packages / Libraries)
README
.. image:: http://applejack.science.ru.nl/lamabadge.php/python-frog
:target: http://applejack.science.ru.nl/languagemachines/
.. image:: https://zenodo.org/badge/23770267.svg
:target: https://zenodo.org/badge/latestdoi/23770267
.. image:: https://www.repostatus.org/badges/latest/active.svg
:alt: Project Status: Active – The project has reached a stable, usable state and is being actively developed.
:target: https://www.repostatus.org/#active
Frog for Python
===================
This is a Python binding to the Natural Language Processing suite Frog. Frog is
intended for Dutch and performs part-of-speech tagging, lemmatisation,
morphological analysis, named entity recognition, shallow parsing, and
dependency parsing. The tool itself is implemented in C++
(https://languagemachines.github.io/frog). The binding requires Python 3.6 or higher.
Demo
------------------
.. image:: https://raw.githubusercontent.com/CLARIAH/wp3-demos/master/python-frog.gif
Installation
----------------
We recommend you use a Python virtual environment and install using ``pip``::
pip install python-frog
When possible on your system, this will install the binary
Python wheels *that include Frog and all necessary dependencies* **except for**
frogdata. To download and install the data (in ``~/.config/frog``) you then only need to
run the following once::
python -c "import frog; frog.installdata()"
If you want language detection support, ensure you the have `libexttextcat`
package (if provided by your distribution) installed prior to executing the
above command.
If the binary wheels are not available for your system, you will need to first
install `Frog `_ yourself and then
run ``pip install python-frog`` to install this python binding, it will then be
compiled from source. The following instructions apply in that case:
On Arch Linux, you can alternatively use the `AUR package `_ .
On macOS; first use `homebrew `_ to install `Frog `_::
brew tap fbkarsdorp/homebrew-lamachine
brew install ucto
On Alpine Linux, run: ``apk add cython frog frog-dev``
Windows is not supported natively at all, but you should be able to use the Ucto python binding if you use WSL, or using Docker containers (see below).
Docker/OCI Containers
~~~~~~~~~~~~~~~~~~~~~~~
A Docker/OCI container image is available containing Python, frog, and python-frog::
docker pull proycon/python-frog
docker run -t -i proycon/python-frog
You can also build the container from scratch from this repository with the included `Dockerfile`.
Usage
------------------
Example:
.. code:: python
from frog import Frog, FrogOptions
frog = Frog(FrogOptions(parser=False))
output = frog.process_raw("Dit is een test")
print("RAW OUTPUT=",output)
output = frog.process("Dit is nog een test.")
print("PARSED OUTPUT=",output)
Output::
RAW OUTPUT= 1 Dit dit [dit] VNW(aanw,pron,stan,vol,3o,ev)
0.777085 O B-NP
2 is zijn [zijn] WW(pv,tgw,ev) 0.999891 O
B-VP
3 een een [een] LID(onbep,stan,agr) 0.999113 O
B-NP
4 test test [test] N(soort,ev,basis,zijd,stan) 0.789112
O I-NP
PARSED OUTPUT= [{'chunker': 'B-NP', 'index': '1', 'lemma': 'dit', 'ner':
'O', 'pos': 'VNW(aanw,pron,stan,vol,3o,ev)', 'posprob': 0.777085, 'text':
'Dit', 'morph': '[dit]'}, {'chunker': 'B-VP', 'index': '2', 'lemma':
'zijn', 'ner': 'O', 'pos': 'WW(pv,tgw,ev)', 'posprob': 0.999966, 'text':
'is', 'morph': '[zijn]'}, {'chunker': 'B-NP', 'index': '3', 'lemma': 'nog',
'ner': 'O', 'pos': 'BW()', 'posprob': 0.99982, 'text': 'nog', 'morph':
'[nog]'}, {'chunker': 'I-NP', 'index': '4', 'lemma': 'een', 'ner': 'O',
'pos': 'LID(onbep,stan,agr)', 'posprob': 0.995781, 'text': 'een', 'morph':
'[een]'}, {'chunker': 'I-NP', 'index': '5', 'lemma': 'test', 'ner': 'O',
'pos': 'N(soort,ev,basis,zijd,stan)', 'posprob': 0.903055, 'text': 'test',
'morph': '[test]'}, {'chunker': 'O', 'index': '6', 'eos': True, 'lemma':
'.', 'ner': 'O', 'pos': 'LET()', 'posprob': 1.0, 'text': '.', 'morph':
'[.]'}]
Available keyword arguments for FrogOptions:
* tok - True/False - Do tokenisation? (default: True)
* lemma - True/False - Do lemmatisation? (default: True)
* morph - True/False - Do morpholigical analysis? (default: True)
* daringmorph - True/False - Do morphological analysis in new experimental style? (default: False)
* mwu - True/False - Do Multi Word Unit detection? (default: True)
* chunking - True/False - Do Chunking/Shallow parsing? (default: True)
* ner - True/False - Do Named Entity Recognition? (default: True)
* parser - True/False - Do Dependency Parsing? (default: False).
* xmlin - True/False - Input is FoLiA XML (default: False)
* xmlout - True/False - Output is FoLiA XML (default: False)
* docid - str - Document ID (for FoLiA)
* numThreads - int - Number of threads to use (default: unset, unlimited)
You can specify a Frog configuration file explicitly as second argument upon instantiation, otherwise the default one is
used:
.. code:: python
frog = Frog(FrogOptions(parser=False), "/path/to/your/frog.cfg")
A third parameter, a dictionary, can be used to override specific configuration values (same syntax as Frog's
``--override`` option), you may want to leave the second parameter empty if you want to load the default configuration:
.. code:: python
frog = Frog(FrogOptions(parser=False), "", { "tokenizer.rulesFile": "tokconfig-nld-twitter" })
FoLiA support
------------------
Frog supports output in the `FoLiA XML format `_ (set ``FrogOptions(xmlout=True)``), as
well as FoLiA input (set ``FrogOptions(xmlin=True)``). The FoLiA format exposes more details about the linguistic
annotation in a more structured and more formal way.
Whenever FoLiA output is requested, the ``process()`` method will return an instance of ``folia.Document``, which is
provided by the `FoLiApy library `_. This loads the entire FoLiA document in memory and
allows you to inspect it in any way you see fit. Extensive documentation for this library can be found here:
http://folia.readthedocs.io/
An example can be found below:
.. code:: python
from frog import Frog, FrogOptions
frog = Frog(FrogOptions(parser=True,xmlout=True))
output = frog.process("Dit is een FoLiA test.")
#output is now no longer a string but an instance of folia.Document, provided by the FoLiA library in PyNLPl (pynlpl.formats.folia)
print("FOLIA OUTPUT AS RAW XML=")
print(output.xmlstring())
print("Inspecting FoLiA output (just a small example):")
for word in output.words():
print(word.text() + " " + word.pos() + " " + word.lemma())