Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pat/pedantic

Pares text down to the words that matter
https://github.com/pat/pedantic

Last synced: 14 days ago
JSON representation

Pares text down to the words that matter

Awesome Lists containing this project

README

        

h1. Pedantic

Pedantic cleans strings of text - stripping out unimportant words and URLs, fixing typos, replacing symbols (like emoticons) with real words, and running the results through a stemmer.

In short - it gives you reliable text to process (but not read).

And if the name didn't give it away, yes this library is opinionated.

h2. Installation

Grab the gem.

gem install pedantic

h2. Usage

Pedantic.fix('my messy string ;)') #=> 'messi string joke'

Note that the stemmer generates imperfect words, but it is reasonably reliable and constant in the output, so you can work with those assumptions in the output.

Also - this library is a work in progress - currently I've aimed for a relatively useful but extremely basic implementation. If you look through the code, you'll see there's few typos and emoticons handled. It's easy enough to extend, though - so please, fork, patch and send a pull request.

h2. Contributing

Fork and patch as you see fit - and please send me a pull request if you think it's useful for others. Don't forget to write specs first, and don't mess with the version numbers please (or at least: only do so in a different branch).

h2. Copyright

Copyright (c) 2010 "Pat Allan":http://freelancing-gods.com, but released under an open licence. Go for your life.