{"id":13621735,"url":"https://github.com/DonAurelio/text-analyzer","last_synced_at":"2025-04-15T01:33:54.080Z","repository":{"id":49708567,"uuid":"71584272","full_name":"DonAurelio/text-analyzer","owner":"DonAurelio","description":"This projects is dedicated to an University Assignment about Natural Language Processing With Freeling and Python ","archived":false,"fork":false,"pushed_at":"2022-12-26T19:58:26.000Z","size":22519,"stargazers_count":2,"open_issues_count":6,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-08-01T21:48:56.480Z","etag":null,"topics":["docker-container","morphological-analysis","natural-language-processing","nltk","python","stanford-parser","stanford-pos-tagger","text-analyzer","text-parser","tokenization"],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DonAurelio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-10-21T17:13:16.000Z","updated_at":"2023-03-27T20:11:36.000Z","dependencies_parsed_at":"2023-01-31T01:01:06.427Z","dependency_job_id":null,"html_url":"https://github.com/DonAurelio/text-analyzer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DonAurelio%2Ftext-analyzer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DonAurelio%2Ftext-analyzer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DonAurelio%2Ftext-analyzer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DonAurelio%2Ftext-analyzer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DonAurelio","download_url":"https://codeload.github.com/DonAurelio/text-analyzer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223654917,"owners_count":17180606,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker-container","morphological-analysis","natural-language-processing","nltk","python","stanford-parser","stanford-pos-tagger","text-analyzer","text-parser","tokenization"],"created_at":"2024-08-01T21:01:10.029Z","updated_at":"2024-11-08T08:31:42.210Z","avatar_url":"https://github.com/DonAurelio.png","language":"HTML","readme":"# TextAnalyzer\n\nThis projects is dedicated to an University Assignment related with Natural Language Processing. The application was designed in python 2.7 with Django 1.9 and is composed by:\n\n* **Tokenization** and **Morfological Analisys** module (called morfo) using freeling and Python 2.7. This app takes a raw text and performs the corresponding Morfoligical Analysis.\n* The second module (textparser) covers **Syntactic Analisys**. It deals with the generation of syntactic trees using probabilistic models (Stanford and Bikel) given a raw text. \n\n\n## Running this project\n\nTo getting this projecto working we need to setting up the **morfo** and **textparser** modules. The configuration \n\n```\nTextAnalyser\n│   README.md\n│   requirements.txt    \n│\n└───tkmorfo\n        applications\n        |\n        └───morfo\n        |           \n        └───textparser\n                tools\n                |\n                └───helpers\n                        00-raw\n                        00\n                        dbparser\n                        parseval\n                        stanford-parserfull-2015-12-09\n                        stanford-postagger-2015-12-09\n                        utils.py\n\n```\n\n## Setting The Docker Container\n\nThis projects was designed into a container, The first module **Tokenization** and **Morfological Analisys** depends on freeling and python 2.7. You can find those package installed on this [docker image](https://drive.google.com/file/d/0ByEHTU9ch3ZwcmJlQW5qdGkyT0E/view?usp=sharing).\n\nThe second module **Syntatic Analisys** depends of the following libraries\n\n- Dan Bikel’s Parsing Engine: dbparser.tar.gz\n\n- Penn Treebank based Trainning set: wsj-02-21.mrg.tar.gz\n\n- Evaluate the accurancy of the model: parseval.tar.gz\n\n- Test set: 00-raw.tar.gz\n\nThose files can be found [this](https://drive.google.com/drive/folders/0ByEHTU9ch3ZwSkhqNl95SUxiZ2M?usp=sharing). Other needed files are:\n\n- Stanford Statistical Parser: [stanford-parser-full-2015-12-09.zip](http://nlp.stanford.edu/software/stanford-parser-full-2015-12-09.zip)\n\n- Stanford Postagger: [stanford-postagger-2015-12-09.zip](http://nlp.stanford.edu/software/stanford-postagger-2015-12-09.zip)\n\n\n\n### Runnig Graphical Applications Into a Contaner\n\nTo run the **Syntactic Analisys** module the container needs to be able to \"show\" or \"create\" grafical UIS. This allow the app to create the parse tree images generated with nltk.\n\n```{r, engine='bash', count_lines}\napt-get install python-tk\napt-get update\napt-get install xvfb\napt-get install imagemagick\n```\nThen you need to run the following command every time that the container starts.\n\n```{r, engine='bash', count_lines}\nXvfb :1 -screen 0 1024x768x16 \u0026\u003e xvfb.log  \u0026\nDISPLAY=:1.0\nexport DISPLAY\n```\n\n### Installing Java for nltk Stanford Pos tagger and parser in the Container\n\n```{r, engine='bash', count_lines}\necho deb http://http.debian.net/debian jessie-backports main \u003e\u003e /etc/apt/sources.list\napt-get update \u0026\u0026 apt-get install openjdk-8-jdk\nupdate-alternatives --config java\n```\n\u003c!--- \njar tvf stanford-parser-3.3.1-models.jar\nExtract models \njar xf stanford-parser-3.6.0-models.jar edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz\n---\u003e\n# References\n\n[1] [Image Viwer HTML Module](http://ignitersworld.com/lab/imageViewer.html)\n\n[2] [Running a GUI Application in a Docker Container](https://linuxmeerkat.wordpress.com/2014/10/17/running-a-gui-application-in-a-docker-container/)\n\n[3] [Draw Parse Trees with NLTK](http://stackoverflow.com/questions/23429117/saving-nltk-drawn-parse-tree-to-image-file) \n\n[4] [Installing Java 8](http://stackoverflow.com/questions/35130798/install-java-8-in-debian-jessie)\n\n[5] [ImagViwer](http://ignitersworld.com/lab/imageViewer.html)\n","funding_links":[],"categories":["HTML"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDonAurelio%2Ftext-analyzer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDonAurelio%2Ftext-analyzer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDonAurelio%2Ftext-analyzer/lists"}