Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/DonAurelio/text-analyzer

This projects is dedicated to an University Assignment about Natural Language Processing With Freeling and Python
https://github.com/DonAurelio/text-analyzer

docker-container morphological-analysis natural-language-processing nltk python stanford-parser stanford-pos-tagger text-analyzer text-parser tokenization

Last synced: about 17 hours ago
JSON representation

This projects is dedicated to an University Assignment about Natural Language Processing With Freeling and Python

Awesome Lists containing this project

README

        

# TextAnalyzer

This projects is dedicated to an University Assignment related with Natural Language Processing. The application was designed in python 2.7 with Django 1.9 and is composed by:

* **Tokenization** and **Morfological Analisys** module (called morfo) using freeling and Python 2.7. This app takes a raw text and performs the corresponding Morfoligical Analysis.
* The second module (textparser) covers **Syntactic Analisys**. It deals with the generation of syntactic trees using probabilistic models (Stanford and Bikel) given a raw text.

## Running this project

To getting this projecto working we need to setting up the **morfo** and **textparser** modules. The configuration

```
TextAnalyser
│ README.md
│ requirements.txt

└───tkmorfo
applications
|
└───morfo
|
└───textparser
tools
|
└───helpers
00-raw
00
dbparser
parseval
stanford-parserfull-2015-12-09
stanford-postagger-2015-12-09
utils.py

```

## Setting The Docker Container

This projects was designed into a container, The first module **Tokenization** and **Morfological Analisys** depends on freeling and python 2.7. You can find those package installed on this [docker image](https://drive.google.com/file/d/0ByEHTU9ch3ZwcmJlQW5qdGkyT0E/view?usp=sharing).

The second module **Syntatic Analisys** depends of the following libraries

- Dan Bikel’s Parsing Engine: dbparser.tar.gz

- Penn Treebank based Trainning set: wsj-02-21.mrg.tar.gz

- Evaluate the accurancy of the model: parseval.tar.gz

- Test set: 00-raw.tar.gz

Those files can be found [this](https://drive.google.com/drive/folders/0ByEHTU9ch3ZwSkhqNl95SUxiZ2M?usp=sharing). Other needed files are:

- Stanford Statistical Parser: [stanford-parser-full-2015-12-09.zip](http://nlp.stanford.edu/software/stanford-parser-full-2015-12-09.zip)

- Stanford Postagger: [stanford-postagger-2015-12-09.zip](http://nlp.stanford.edu/software/stanford-postagger-2015-12-09.zip)

### Runnig Graphical Applications Into a Contaner

To run the **Syntactic Analisys** module the container needs to be able to "show" or "create" grafical UIS. This allow the app to create the parse tree images generated with nltk.

```{r, engine='bash', count_lines}
apt-get install python-tk
apt-get update
apt-get install xvfb
apt-get install imagemagick
```
Then you need to run the following command every time that the container starts.

```{r, engine='bash', count_lines}
Xvfb :1 -screen 0 1024x768x16 &> xvfb.log &
DISPLAY=:1.0
export DISPLAY
```

### Installing Java for nltk Stanford Pos tagger and parser in the Container

```{r, engine='bash', count_lines}
echo deb http://http.debian.net/debian jessie-backports main >> /etc/apt/sources.list
apt-get update && apt-get install openjdk-8-jdk
update-alternatives --config java
```

# References

[1] [Image Viwer HTML Module](http://ignitersworld.com/lab/imageViewer.html)

[2] [Running a GUI Application in a Docker Container](https://linuxmeerkat.wordpress.com/2014/10/17/running-a-gui-application-in-a-docker-container/)

[3] [Draw Parse Trees with NLTK](http://stackoverflow.com/questions/23429117/saving-nltk-drawn-parse-tree-to-image-file)

[4] [Installing Java 8](http://stackoverflow.com/questions/35130798/install-java-8-in-debian-jessie)

[5] [ImagViwer](http://ignitersworld.com/lab/imageViewer.html)