https://github.com/searchivarius/toolsnlp

UIMA-ECD wrappers for some basic NLP tools.
https://github.com/searchivarius/toolsnlp

Last synced: about 1 month ago
JSON representation

UIMA-ECD wrappers for some basic NLP tools.

Host: GitHub
URL: https://github.com/searchivarius/toolsnlp
Owner: searchivarius
Created: 2013-09-11T14:07:21.000Z (over 11 years ago)
Default Branch: master
Last Pushed: 2013-12-22T08:59:37.000Z (over 11 years ago)
Last Synced: 2025-03-17T18:19:57.274Z (about 2 months ago)
Language: Java
Homepage:
Size: 22.6 MB
Stars: 0
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

ToolsNLP
========

UIMA-ECD wrappers for various basic NLP tools. This was used for a course-project. For more details on UIMA-ECD, please, see https://github.com/oaqa/oaqa-tutorial

Pre-requisits: Java, Python (nltk.util.clean_html should be installed), and the Unix command-line utility html2text

Sub-project:

1. Ex1: five simple HTML cleaners (regexp, my own cleaner, Apache Tika, NLTK, and Unix html2text). One script launches/runAll.sh runs them all.
2. Ex2: wrappers for sentence segmenters and tokenizers. The script launches/run_ex2.sh runs them.
3. Ex3: the wrapper for clearTK/OpenNLP POS tagger.
4. Project: a rudimentary proof-of-concept information extractor. It attemps to extract the following information from Wikipedia descriptions of countries: capital, languages spoken, religion.

Additional requirements:

1. Unix utility html2text
2. Python + nltk.util.html_clean
3. Compiled Senna parser (http://ml.nec-labs.com/senna/).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/searchivarius/toolsnlp

Awesome Lists containing this project

README