Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/DonAurelio/text-analyzer
This projects is dedicated to an University Assignment about Natural Language Processing With Freeling and Python
https://github.com/DonAurelio/text-analyzer
docker-container morphological-analysis natural-language-processing nltk python stanford-parser stanford-pos-tagger text-analyzer text-parser tokenization
Last synced: 3 months ago
JSON representation
This projects is dedicated to an University Assignment about Natural Language Processing With Freeling and Python
- Host: GitHub
- URL: https://github.com/DonAurelio/text-analyzer
- Owner: DonAurelio
- License: mit
- Created: 2016-10-21T17:13:16.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2022-12-26T19:58:26.000Z (about 2 years ago)
- Last Synced: 2024-08-01T21:48:56.480Z (6 months ago)
- Topics: docker-container, morphological-analysis, natural-language-processing, nltk, python, stanford-parser, stanford-pos-tagger, text-analyzer, text-parser, tokenization
- Language: HTML
- Size: 21.5 MB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- project-awesome - DonAurelio/text-analyzer - This projects is dedicated to an University Assignment about Natural Language Processing With Freeling and Python (HTML)
README
# TextAnalyzer
This projects is dedicated to an University Assignment related with Natural Language Processing. The application was designed in python 2.7 with Django 1.9 and is composed by:
* **Tokenization** and **Morfological Analisys** module (called morfo) using freeling and Python 2.7. This app takes a raw text and performs the corresponding Morfoligical Analysis.
* The second module (textparser) covers **Syntactic Analisys**. It deals with the generation of syntactic trees using probabilistic models (Stanford and Bikel) given a raw text.## Running this project
To getting this projecto working we need to setting up the **morfo** and **textparser** modules. The configuration
```
TextAnalyser
│ README.md
│ requirements.txt
│
└───tkmorfo
applications
|
└───morfo
|
└───textparser
tools
|
└───helpers
00-raw
00
dbparser
parseval
stanford-parserfull-2015-12-09
stanford-postagger-2015-12-09
utils.py```
## Setting The Docker Container
This projects was designed into a container, The first module **Tokenization** and **Morfological Analisys** depends on freeling and python 2.7. You can find those package installed on this [docker image](https://drive.google.com/file/d/0ByEHTU9ch3ZwcmJlQW5qdGkyT0E/view?usp=sharing).
The second module **Syntatic Analisys** depends of the following libraries
- Dan Bikel’s Parsing Engine: dbparser.tar.gz
- Penn Treebank based Trainning set: wsj-02-21.mrg.tar.gz
- Evaluate the accurancy of the model: parseval.tar.gz
- Test set: 00-raw.tar.gz
Those files can be found [this](https://drive.google.com/drive/folders/0ByEHTU9ch3ZwSkhqNl95SUxiZ2M?usp=sharing). Other needed files are:
- Stanford Statistical Parser: [stanford-parser-full-2015-12-09.zip](http://nlp.stanford.edu/software/stanford-parser-full-2015-12-09.zip)
- Stanford Postagger: [stanford-postagger-2015-12-09.zip](http://nlp.stanford.edu/software/stanford-postagger-2015-12-09.zip)
### Runnig Graphical Applications Into a Contaner
To run the **Syntactic Analisys** module the container needs to be able to "show" or "create" grafical UIS. This allow the app to create the parse tree images generated with nltk.
```{r, engine='bash', count_lines}
apt-get install python-tk
apt-get update
apt-get install xvfb
apt-get install imagemagick
```
Then you need to run the following command every time that the container starts.```{r, engine='bash', count_lines}
Xvfb :1 -screen 0 1024x768x16 &> xvfb.log &
DISPLAY=:1.0
export DISPLAY
```### Installing Java for nltk Stanford Pos tagger and parser in the Container
```{r, engine='bash', count_lines}
echo deb http://http.debian.net/debian jessie-backports main >> /etc/apt/sources.list
apt-get update && apt-get install openjdk-8-jdk
update-alternatives --config java
```# References
[1] [Image Viwer HTML Module](http://ignitersworld.com/lab/imageViewer.html)
[2] [Running a GUI Application in a Docker Container](https://linuxmeerkat.wordpress.com/2014/10/17/running-a-gui-application-in-a-docker-container/)
[3] [Draw Parse Trees with NLTK](http://stackoverflow.com/questions/23429117/saving-nltk-drawn-parse-tree-to-image-file)
[4] [Installing Java 8](http://stackoverflow.com/questions/35130798/install-java-8-in-debian-jessie)
[5] [ImagViwer](http://ignitersworld.com/lab/imageViewer.html)