Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/janx/ruby-pinyin
中文汉字转拼音, 支持中英文符号混合词语。Pinyin is a romanization system (phonemic notation) of Chinese characters, this gem helps you to convert Chinese characters into pinyin form.
https://github.com/janx/ruby-pinyin
Last synced: 5 days ago
JSON representation
中文汉字转拼音, 支持中英文符号混合词语。Pinyin is a romanization system (phonemic notation) of Chinese characters, this gem helps you to convert Chinese characters into pinyin form.
- Host: GitHub
- URL: https://github.com/janx/ruby-pinyin
- Owner: janx
- License: bsd-3-clause
- Created: 2012-07-06T04:32:40.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2021-08-21T03:31:22.000Z (over 3 years ago)
- Last Synced: 2025-02-09T09:06:57.191Z (12 days ago)
- Language: Ruby
- Homepage:
- Size: 8.96 MB
- Stars: 542
- Watchers: 23
- Forks: 65
- Open Issues: 15
-
Metadata Files:
- Readme: README.markdown
- License: LICENSE
Awesome Lists containing this project
README
# ruby-pinyin: 支持多音字的汉字转拼音工具
[data:image/s3,"s3://crabby-images/5381b/5381be9b5d53185e3239466d45714304f9924757" alt="Build Status"](https://travis-ci.org/janx/ruby-pinyin)ruby-pinyin: zhī chí duō yīn zì de hàn zì zhuǎn pīn yīn gōng jù
ruby-pinyin可以把汉字转化为对应的拼音,并能够较好的处理多音字的情况。比如:
PinYin.of_string('南京市长江大桥', :ascii)
能够正确的将“长”转为"chang2", 而不是"zhang3".
## Features
* 支持多音字
* 使用最新的UNICODE数据(6.3.0 published at 2013/02/26)
* 能够显示数字或者UNICODE音调(eg: 'cao1', 'cāo')
* 丰富的API
* 支持中英文标点混合字符串
* 中文标点转为英文标点
* 支持自定义读音## Installation
gem install ruby-pinyin
或者把ruby-pinyin加入你的Gemfile:
gem 'ruby-pinyin'
## Examples
# encoding: utf-8
require 'ruby-pinyin'# return ['jie', 'cao']
PinYin.of_string('节操')# return ['jie2', 'cao1']
PinYin.of_string('节操', true)
PinYin.of_string('节操', :ascii)# return ["jié", "cāo"]
PinYin.of_string('节操', :unicode)# 正确处理多音字: return ["nán", "jīng", "shì", "cháng", "jiāng", "dà", "qiáo"]
PinYin.of_string('南京市长江大桥', :unicode)# return %w(gan xie party gan xie guo jia)
PinYin.of_string('感谢party感谢guo jia')# return 'gan-xie-party-gan-xie-guo-jia'
PinYin.permlink('感谢party感谢guo jia')# return 'gxpartygxguojia'
PinYin.abbr('感谢party感谢guo jia')# return 'gan xie party, gan xie guo jia!'
# PinYin.sentence保留标点符号, 同时用对应英文标点代替中文标点
PinYin.sentence('感谢party, 感谢guo家!')# override readings with your own data file
PinYin.override_files = [File.expand_path('../my.dat', __FILE__)]更多的例子和参数请参考[测试用例](https://github.com/janx/ruby-pinyin/blob/master/test/pinyin_test.rb)
## 配置 ##
ruby-pinyin有两个PinYin::Backend: `PinYin::Backend::Simple` 以及`PinYin::Backend::MMSeg`. 默认是使用MMSeg backend, 支持多音字识别。如果你不需要多音字识别,或是对内存使用要求很高,或是有其它任何原因想要fallback到Simple backend, 可以如下配置:
```ruby
PinYin.backend = PinYin::Backend::Simple.new
```## 自定义发音 ##
通过`PinYin.override_files`可以自定义某些字的发音。自定义的数据以普通文本文件存放,每行定义一个字的发音,以ASCII空格将汉字的unicode编码和拼音隔开。格式可参考[lib/ruby-pinyin/data/Mandarin.dat](https://github.com/janx/ruby-pinyin/blob/master/lib/ruby-pinyin/data/Mandarin.dat)文件。
## 欢迎任何帮助 ##
如果你喜欢这个项目,请通过(不限)以下方式帮助她!
* 各种使用
* 各种宣传
* 各种报告bug, 提供建议 (github issue tracker)
* 各种修bug, 实现feature (github pull request)## LICENSE ##
[BSD LICENSE](https://github.com/janx/ruby-pinyin/blob/master/LICENSE)
ruby-pinyin中的拼音数据由作者整理自互联网,你可以在ruby-pinyin之外的地方任意使用,但是请注明数据来自ruby-pinyin :-)
## Contributors ##
* [Martin91](https://github.com/Martin91)
* [jaxi](https://github.com/jaxi)
* [jiangxin](https://github.com/jiangxin)
* [forresty](https://github.com/forresty)
* [pzpz](https://github.com/pzpz)
* [Eric Guo](https://github.com/Eric-Guo)