Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pat/pedantic
Pares text down to the words that matter
https://github.com/pat/pedantic
Last synced: 17 days ago
JSON representation
Pares text down to the words that matter
- Host: GitHub
- URL: https://github.com/pat/pedantic
- Owner: pat
- License: mit
- Created: 2010-02-08T11:51:13.000Z (almost 15 years ago)
- Default Branch: master
- Last Pushed: 2010-03-18T08:57:16.000Z (almost 15 years ago)
- Last Synced: 2024-11-30T04:02:57.477Z (23 days ago)
- Language: Ruby
- Homepage:
- Size: 97.7 KB
- Stars: 14
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.textile
- License: LICENSE
Awesome Lists containing this project
README
h1. Pedantic
Pedantic cleans strings of text - stripping out unimportant words and URLs, fixing typos, replacing symbols (like emoticons) with real words, and running the results through a stemmer.
In short - it gives you reliable text to process (but not read).
And if the name didn't give it away, yes this library is opinionated.
h2. Installation
Grab the gem.
gem install pedantic
h2. Usage
Pedantic.fix('my messy string ;)') #=> 'messi string joke'
Note that the stemmer generates imperfect words, but it is reasonably reliable and constant in the output, so you can work with those assumptions in the output.
Also - this library is a work in progress - currently I've aimed for a relatively useful but extremely basic implementation. If you look through the code, you'll see there's few typos and emoticons handled. It's easy enough to extend, though - so please, fork, patch and send a pull request.
h2. Contributing
Fork and patch as you see fit - and please send me a pull request if you think it's useful for others. Don't forget to write specs first, and don't mess with the version numbers please (or at least: only do so in a different branch).
h2. Copyright
Copyright (c) 2010 "Pat Allan":http://freelancing-gods.com, but released under an open licence. Go for your life.