https://github.com/takumakanari/japanese-numbers-python
A parser for Japanese number (Kanji, arabic) in the natural language.
https://github.com/takumakanari/japanese-numbers-python
natural-language-processing python
Last synced: about 1 month ago
JSON representation
A parser for Japanese number (Kanji, arabic) in the natural language.
- Host: GitHub
- URL: https://github.com/takumakanari/japanese-numbers-python
- Owner: takumakanari
- License: mit
- Created: 2016-11-08T02:49:37.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2020-04-04T01:36:28.000Z (about 5 years ago)
- Last Synced: 2025-04-05T10:51:09.588Z (2 months ago)
- Topics: natural-language-processing, python
- Language: Python
- Size: 18.6 KB
- Stars: 20
- Watchers: 2
- Forks: 5
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# japanese_numbers
[](https://circleci.com/gh/takumakanari/japanese-numbers-python/tree/master)
A parser for Japanese number (Kanji, arabic) in the natural language.
The module **japanese_numbers** finds any numbers in the natural language, and converts to arabic numerals.
The followings are example patterns what can be parsed.- 二千万百一円
- 5百万
- 一を聞いて十を知る
- 五〇六号室### Installation
pip install japanese-numbers-python
## Usage
Function `to_arabic` and `to_arabic_numbers` are almost stable.
`to_arabic` returns An array of *[japanese_numbers.result.ParsedResult]*.
```python
import japanese_numbersjapanese_numbers.to_arabic('銀河の向こう、六千三百二十一億千五百十一万二千百八十一光年彼方。')
# => []japanese_numbers.to_arabic('一を聞いて十を知る。')
# => [, ]```
Then you can see a numeric value (and others) in the instance of *ParsedResult* like as follows:
```python
result = japanese_numbers.to_arabic('一を聞いて十を知る。')result[0].number # => 1
result[0].text # => '一'
result[0].index # => 0 as position that number was foundresult[1].number # => 10
result[1].text # => '十'
result[1].index # => 5```
`to_arabic_numbers` returns a tuple of numbers directly.
```python
import japanese_numbersjapanese_numbers.to_arabic_numbers('一を聞いて十を知る。')
# => (1, 10)
```### Charsets
Both `to_arabic_numbers`, `to_arabic` get `encode` option to specify encode of input.
It's *utf8* by default, if you put non-unicode string into functions, it will be converted to unicode by using its encode first.
```python
japanese_numbers.to_arabic_numbers('一を聞いて十を知る。') # utf8 by default
japanese_numbers.to_arabic('一を聞いて十を知る。', encode='eucjp') # set another charset
```### TODO
- support float/double types
- support negative types### Patch
Welcome!