Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zzzsochi/trans
National characters transcription module.
https://github.com/zzzsochi/trans
translit transliterate transliteration transliterator utf8
Last synced: 2 months ago
JSON representation
National characters transcription module.
- Host: GitHub
- URL: https://github.com/zzzsochi/trans
- Owner: zzzsochi
- License: other
- Created: 2011-11-29T07:15:24.000Z (about 13 years ago)
- Default Branch: master
- Last Pushed: 2018-11-09T07:21:18.000Z (about 6 years ago)
- Last Synced: 2024-10-02T23:47:13.305Z (3 months ago)
- Topics: translit, transliterate, transliteration, transliterator, utf8
- Language: Python
- Homepage: http://www.python.org/pypi/trans/
- Size: 55.7 KB
- Stars: 22
- Watchers: 5
- Forks: 10
- Open Issues: 2
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG
- License: LICENSE
Awesome Lists containing this project
README
====================
The **trans** module
====================This module translates national characters into similar sounding
latin characters (transliteration).
At the moment, Czech, Greek, Latvian, Polish, Turkish, Russian, Ukrainian,
Kazakh and Farsi alphabets are supported (it covers 99% of needs)... contents::
Simple usage
------------
It's very easy to use
~~~~~~~~~~~~~~~~~~~~~Python 3:
>>> from trans import trans
>>> trans('Привет, Мир!')Python 2:
>>> import trans
>>> u'Привет, Мир!'.encode('trans')
u'Privet, Mir!'
>>> trans.trans(u'Привет, Мир!')
u'Privet, Mir!'Work only with unicode strings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> 'Hello World!'.encode('trans')
Traceback (most recent call last):
...
TypeError: trans codec support only unicode string, given.This is readability
~~~~~~~~~~~~~~~~~~~
>>> s = u'''\
... -- Раскудрить твою через коромысло в бога душу мать
... триста тысяч раз едрену вошь тебе в крыло
... и кактус в глотку! -- взревел разъяренный Никодим.
... -- Аминь, -- робко добавил из склепа папа Пий.
... (c) Г. Л. Олди, "Сказки дедушки вампира".'''
>>>
>>> print s.encode('trans')
-- Raskudrit tvoyu cherez koromyslo v boga dushu mat
trista tysyach raz edrenu vosh tebe v krylo
i kaktus v glotku! -- vzrevel razyarennyy Nikodim.
-- Amin, -- robko dobavil iz sklepa papa Piy.
(c) G. L. Oldi, "Skazki dedushki vampira".Table "**slug**"
~~~~~~~~~~~~~~~~
Use the table "slug", leaving only the Latin characters, digits and underscores:>>> print u'1 2 3 4 5 \n6 7 8 9 0'.encode('trans')
1 2 3 4 5
6 7 8 9 0
>>> print u'1 2 3 4 5 \n6 7 8 9 0'.encode('trans/slug')
1_2_3_4_5__6_7_8_9_0
>>> s.encode('trans/slug')[-42:-1]
u'_c__G__L__Oldi___Skazki_dedushki_vampira_'Table "**id**"
~~~~~~~~~~~~~~
Table **id** is deprecated and renamed to **slug**.
Old name also available, but not recommended.Define user tables
------------------
Simple variant
~~~~~~~~~~~~~~
>>> u'1 2 3 4 5 6 7 8 9 0'.encode('trans/my')
Traceback (most recent call last):
...
ValueError: Table "my" not found in tables!
>>> trans.tables['my'] = {u'1': u'A', u'2': u'B'};
>>> u'1 2 3 4 5 6 7 8 9 0'.encode('trans/my')
u'A_B________________'
>>>A little harder
~~~~~~~~~~~~~~~
Table can consist of two parts - the map of diphthongs and the map of characters.
Diphthongs are processed first by simple replacement in the substring.
Then each character of the received string is replaced according to the map of
characters. If character is absent in the map of characters, key **None** are checked.
If key **None** is not present, the default character **u'_'** is used.>>> diphthongs = {u'11': u'AA', u'22': u'BB'}
>>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-',
... u'A': u'A', u'B': u'B'} # See below...
>>> trans.tables['test'] = (diphthongs, characters)
>>> u'11abc22cbaCC'.encode('trans/test')
u'AAzyxBBxyz--'**The characters are created by processing of diphthongs also processed
by the map of the symbols:**>>> diphthongs = {u'11': u'AA', u'22': u'BB'}
>>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-'}
>>> trans.tables['test'] = (diphthongs, characters)
>>> u'11abc22cbaCC'.encode('trans/test')
u'--zyx--xyz--'Without the diphthongs
~~~~~~~~~~~~~~~~~~~~~~
These two tables are equivalent:>>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-'}
>>> trans.tables['t1'] = characters
>>> trans.tables['t2'] = ({}, characters)
>>> u'11abc22cbaCC'.encode('trans/t1') == u'11abc22cbaCC'.encode('trans/t2')
TrueChangeLog
---------2.1 2016-09-19
* Add Farsi alphabet (thx rodgar-nvkz)
* Use pytest
* Some code style refactoring2.0 2013-04-01
* Python 3 support
* class Trans for create different tables spaces1.5 2012-09-12
* Add support of kazakh alphabet.
1.4 2011-11-29
* Change license to BSD.
1.3 2010-05-18
* Table "id" renamed to "slug". Old name also available.
* Some speed optimizations (thx to AndyLegkiy ).1.2 2010-01-10
* First public release.
* Translate documentation to English.Finally
-------
+ *Special thanks to Yuri Yurevich aka j2a for the kick in the right direction.*
- http://python.su/forum/viewtopic.php?pid=28965
- http://code.djangoproject.com/browser/django/trunk/django/contrib/admin/media/js/urlify.js
+ *I ask forgiveness for my bad English. I promise to be corrected.*