Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Khodnevis-Research-Lab/khoshnevis

Khodnevis Normalizer: A Python library for Persian text preprocessing.
https://github.com/Khodnevis-Research-Lab/khoshnevis

farsi-text-cleaner persian-nlp persian-text-cleaning text-cleaner

Last synced: 3 months ago
JSON representation

Khodnevis Normalizer: A Python library for Persian text preprocessing.

Awesome Lists containing this project

README

        


Khoshnevis (خوشنويس)
====

Python package for **normalizing** Persian text.

+ Text Cleaning
+ URL Remover
+ Emoji Remover
+ Text Tokenization
+ Punctuation Space Correction
+ Half Space Correction (using [Parsivar](https://github.com/ICTRC/Parsivar))
+ Standardize Alphabet
+ [NLTK](http://nltk.org/) compatible
+ Python 3 support

## Usage

```python
>>> from khoshnevis import Normalizer

>>> normalizer = Normalizer()

>>> normalizer.normalize(text="استفاده از نیم‌فاصله متن را زیبا مي كند", zwnj="\u200c",
clean_url=False, remove_emoji=False)
```

```bibtex
text (str): input text
zwnj (str, optional): Zero-width non-joiner character. Defaults to "\u200c".
clean_url (bool, optional): removes all URLs from text. Defaults to True.
remove_emoji (bool, optional): removes all emojis from the text. Defaults to True.
```

## Installation
The latest stable version of Hazm can be installed through `pip`:

pip install khoshnevis

## Citation info
```bibtex
@misc{khoshnevis,
author = {HamidReza Attar, Milad Lotfi, Saied Alimoradi},
title = {Khoshnevis, a Python library for Persian text preprocessing},
year = {2022},
url= {https://www.khodnevisai.com/},
}
```