Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aziz/virastar
cleanning up Persian text!
https://github.com/aziz/virastar
Last synced: about 1 month ago
JSON representation
cleanning up Persian text!
- Host: GitHub
- URL: https://github.com/aziz/virastar
- Owner: aziz
- License: other
- Created: 2011-01-11T03:16:38.000Z (almost 14 years ago)
- Default Branch: master
- Last Pushed: 2018-05-25T11:51:40.000Z (over 6 years ago)
- Last Synced: 2024-10-14T01:48:00.661Z (about 2 months ago)
- Language: Ruby
- Homepage: http://virastar.heroku.com
- Size: 113 KB
- Stars: 82
- Watchers: 9
- Forks: 12
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-persian - virastar - Cleaning up & normalizing Persian text. (Ruby)
README
-----
#ویراستار
نوشتههای فارسی شما را ویرایش میکند-----
Virastar (in Persian:ویراستار)## Specifications
###Virastar
* should add persian_cleanup method to String class
* should replace Arabic kaf with its Persian equivalent
* should replace Arabic Yeh with its Persian equivalent
* should replace Arabic numbers with their Persian equivalent
* should replace English numbers with their Persian equivalent
* should replace English comma and semicolon with their Persian equivalent
* should correct :;,.?! spacing (one space after and no space before)
* should replace English quotes with their Persian equivalent
* should replace three dots with ellipsis
* should convert ه ی to هٔ
* should replace double dash to ndash and triple dash to mdash
* should replace more than one space with just a single one
* should remove unnecessary zwnj chars that are succeeded/preceded by a space
* should fix spacing for () [] {} “” «» (one space outside, no space inside)
* should replace English percent sign to its Persian equivalent
* should replace more that one line breaks with just one
* should not replace line breaks
* should put zwnj between word and prefix/suffix (ha haye* tar* tarin mi* nemi*)
* should not replace English numbers in English phrases
* should not destroy urls in the text#### aggressive editing
* should replace more than one ! or ? mark with just one
* should remove all kashidas
-----
## Install
gem install virastar## Usage
"فارسي را كمی درست تر می نويسيم".persian_cleanup # => "فارسی را کمی درستتر مینویسیم"virastar comes with a list of flags to control its behavior, all flags are turned on by default but you can
turn them off by passing an options hash to the `persian_cleanup` method"سلام 123".persian_cleanup(:fix_english_numbers => false) # => "سلام 123"
here is the list of all flags:
* `fix_dashes`
* `fix_three_dots`
* `fix_english_quotes`
* `fix_hamzeh`
* `cleanup_zwnj`
* `fix_spacing_for_braces_and_quotes`
* `fix_arabic_numbers`
* `fix_english_numbers`
* `fix_misc_non_persian_chars`
* `fix_perfix_spacing`
* `fix_suffix_spacing`
* `aggresive`
* `cleanup_kashidas`
* `cleanup_extra_marks`
* `cleanup_spacing`
* `cleanup_begin_and_end`## Acknowledgment
Virastar is highly inspired by [Virasbaz](http://virasbaz.persianlanguage.ir).## Note on Patches/Pull Requests
* Fork the project.
* Make your feature addition or bug fix.
* Add tests for it. This is important so I don't break it in a
future version unintentionally.
* Commit, do not mess with rakefile, version, or history.
(if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
* Send me a pull request. Bonus points for topic branches.## Copyright
Copyright (c) 2011 Allen A. Bargi. See LICENSE for details.