https://github.com/polm/cutlet

Japanese to romaji converter in Python
https://github.com/polm/cutlet

japanese nlp romaji

Last synced: 6 months ago
JSON representation

Japanese to romaji converter in Python

Host: GitHub
URL: https://github.com/polm/cutlet
Owner: polm
License: mit
Created: 2020-04-16T14:00:34.000Z (over 6 years ago)
Default Branch: main
Last Pushed: 2025-06-02T05:08:54.000Z (about 1 year ago)
Last Synced: 2026-01-07T17:37:12.131Z (7 months ago)
Topics: japanese, nlp, romaji
Language: Python
Homepage: https://polm.github.io/cutlet/
Size: 521 KB
Stars: 371
Watchers: 5
Forks: 24
Open Issues: 3
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

          [![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://polm-cutlet-demo-demo-0tur8v.streamlit.app/)

[![Current PyPI packages](https://badge.fury.io/py/cutlet.svg)](https://pypi.org/project/cutlet/)

# cutlet



Cutlet is a tool to convert Japanese to romaji. Check out the [interactive demo][demo]! Also see the [docs](https://polm.github.io/cutlet/cutlet.html) and the [original blog post](https://www.dampfkraft.com/nlp/cutlet-python-romaji-converter.html). 

[demo]: https://polm-cutlet-demo-demo-0tur8v.streamlit.app/

**issueを英語で書く必要はありません。**

Features:

- support for [Modified Hepburn](https://en.wikipedia.org/wiki/Hepburn_romanization), [Kunreisiki](https://en.wikipedia.org/wiki/Kunrei-shiki_romanization), [Nihonsiki](https://en.wikipedia.org/wiki/Nihon-shiki_romanization) systems

- custom overrides for individual mappings

- custom overrides for specific words

- built in exceptions list (Tokyo, Osaka, etc.)

- uses foreign spelling when available in UniDic

- proper nouns are capitalized

- slug mode for url generation

Things not supported:

- traditional Hepburn n-to-m: Shimbashi

- macrons or circumflexes: Tōkyō, Tôkyô

- passport Hepburn: Satoh (but you can use an exception)

- hyphenating words

- Traditional Hepburn in general is not supported

Internally, cutlet uses [fugashi](https://github.com/polm/fugashi), so you can

use the same dictionary you use for normal tokenization.

## Installation

Cutlet can be installed through pip as usual.

    pip install cutlet

Note that if you don't have a MeCab dictionary installed you'll also have to

install one. If you're just getting started

[unidic-lite](https://github.com/polm/unidic-lite) is a good choice.

    pip install unidic-lite

## Usage

A command-line script is included for quick testing. Just use `cutlet` and each

line of stdin will be treated as a sentence. You can specify the system to use

(`hepburn`, `kunrei`, `nippon`, or `nihon`) as the first argument.

    $ cutlet

    ローマ字変換プログラム作ってみた。

    Roma ji henkan program tsukutte mita.

In code:

```python

import cutlet

katsu = cutlet.Cutlet()

katsu.romaji("カツカレーは美味しい")

# => 'Cutlet curry wa oishii'

# you can print a slug suitable for urls

katsu.slug("カツカレーは美味しい")

# => 'cutlet-curry-wa-oishii'

# You can disable using foreign spelling too

katsu.use_foreign_spelling = False

katsu.romaji("カツカレーは美味しい")

# => 'Katsu karee wa oishii'

# kunreisiki, nihonsiki work too

katu = cutlet.Cutlet('kunrei')

katu.romaji("富士山")

# => 'Huzi yama'

# comparison

nkatu = cutlet.Cutlet('nihon')

sent = "彼女は王への手紙を読み上げた。"

katsu.romaji(sent)

# => 'Kanojo wa ou e no tegami wo yomiageta.'

katu.romaji(sent)

# => 'Kanozyo wa ou e no tegami o yomiageta.'

nkatu.romaji(sent)

# => 'Kanozyo ha ou he no tegami wo yomiageta.'

```

## Alternatives

- [kakasi](http://kakasi.namazu.org/index.html.ja): Historically important, but not updated since 2014. 

- [pykakasi](https://codeberg.org/miurahr/pykakasi): self contained, it does segmentation on its own and uses its own dictionary.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/polm/cutlet

Awesome Lists containing this project

README