Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pudo/normality
A tiny library for Python text normalisation. Useful for ad-hoc text processing.
https://github.com/pudo/normality
normalization normalizer slugs unicode unicode-characters
Last synced: about 2 months ago
JSON representation
A tiny library for Python text normalisation. Useful for ad-hoc text processing.
- Host: GitHub
- URL: https://github.com/pudo/normality
- Owner: pudo
- License: mit
- Created: 2015-01-24T11:54:21.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2024-01-04T11:00:21.000Z (about 1 year ago)
- Last Synced: 2024-04-14T02:21:41.947Z (9 months ago)
- Topics: normalization, normalizer, slugs, unicode, unicode-characters
- Language: Python
- Size: 106 KB
- Stars: 136
- Watchers: 6
- Forks: 18
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- starred-awesome - normality - A tiny library for Python text normalisation. Useful for ad-hoc text processing. (Python)
README
# normality text cleanup
[![build](https://github.com/pudo/normality/actions/workflows/build.yml/badge.svg)](https://github.com/pudo/normality/actions/workflows/build.yml)
Normality is a Python micro-package that contains a small set of text
normalization functions for easier re-use. These functions accept a
snippet of unicode or utf-8 encoded text and remove various classes
of characters, such as diacritics, punctuation etc. This is useful as
a preparation to further text analysis.**WARNING**: This library works much better when used in combination
with ``pyicu``, a Python binding for the International Components for
Unicode C library. ICU provides much better text transliteration than
the default ``text-unidecode``.## Example
```python
# coding: utf-8
from normality import normalize, slugify, collapse_spacestext = normalize('Nie wieder "Grüne Süppchen" kochen!')
assert text == 'nie wieder grune suppchen kochen'slug = slugify('My first blog post!')
assert slug == 'my-first-blog-post'text = 'this \n\n\r\nhas\tlots of \nodd spacing.'
assert collapse_spaces(text) == 'this has lots of odd spacing.'
```## License
``normality`` is open source, licensed under a standard MIT license
(included in this repository as ``LICENSE``).