https://github.com/xiaohan2012/capitalization-restoration
Restore the capitalization of text
https://github.com/xiaohan2012/capitalization-restoration
Last synced: about 1 year ago
JSON representation
Restore the capitalization of text
- Host: GitHub
- URL: https://github.com/xiaohan2012/capitalization-restoration
- Owner: xiaohan2012
- License: gpl-2.0
- Created: 2015-05-27T12:17:36.000Z (about 11 years ago)
- Default Branch: puls
- Last Pushed: 2015-08-27T12:54:46.000Z (almost 11 years ago)
- Last Synced: 2024-04-14T18:06:57.537Z (about 2 years ago)
- Language: Python
- Size: 165 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Introduction
Restore the capitalize of news titles. For example, the original one(incorrectly capitalized) is *DreamWorks Animation zone to open in motiongate Dubai*.
The correctly capitalized one is *DreamWorks Animation zone to open in motiongate Dubai*.
We want to restore to the above.
# Usage
## Python call
>>> from cap_restore import DefaultRestorer
>>> restorer=DefaultRestorer()
>>> s = u"Kingdom's Tourism and Hospitality Sector to Draw Huge Investments".split()
>>> docpath = "/group/home/puls/Shared/capitalization-recovery/10/www.zawya.com.rssfeeds.tourism/E85D3090167053EFB118C243D9747FAC"
>>> print " ".join(restorer.restore(s, docpath=docpath))
Kingdom's Tourism and hospitality sector to draw huge investments
>>> pos = ('NNP', ':', VBP, 'CC', 'NNP', 'NNP', 'TO', 'NNP', 'NNP', 'NNP')
>>> print " ".join(restorer.restore(s, docpath=docpath, pos=pos))
Kingdom's tourism and hospitality sector to draw huge investments
### POS tag
POS tags can be passed in the request so that `NLTK.pos_tag` is not called. For the tag format, please use [Penn Part of Speech Tags]( http://cs.nyu.edu/grishman/jet/guide/PennPOS.html).
## Command line
First, open the web service:
>>> python service.py
Second,
>>> python capitalization_restoration_web.py --help
## Demo
For Curl:
>>> ./curl_cmd_demo.sh
For Shell:
>>> ./py_cmd_demo.sh
## How to add new features
1. Modify the `feature_extractor.py` and `feature_template.py`
2. Retrain the data
3. Update the model in `models/`
## Links
- [Monit(service supervision tool)](https://mmonit.com/monit/)
## Running Monit
Before running `monit`, paths setting should be fixed.
Two things should be done:
1. Ensure the `pid` directory in `cap_restore.sh` is writable
2. Ensure the "check process" part in `monitrc` has the correct path information
Then, run the following to start monitoring
>>> monit -c /path/to/monitrc # one sample is shiped under this directory