Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Khodnevis-Research-Lab/khoshnevis
Khodnevis Normalizer: A Python library for Persian text preprocessing.
https://github.com/Khodnevis-Research-Lab/khoshnevis
farsi-text-cleaner persian-nlp persian-text-cleaning text-cleaner
Last synced: 3 months ago
JSON representation
Khodnevis Normalizer: A Python library for Persian text preprocessing.
- Host: GitHub
- URL: https://github.com/Khodnevis-Research-Lab/khoshnevis
- Owner: Khodnevis-Research-Lab
- Created: 2022-07-16T13:39:11.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-09-13T08:10:22.000Z (about 2 years ago)
- Last Synced: 2024-06-28T08:35:43.430Z (5 months ago)
- Topics: farsi-text-cleaner, persian-nlp, persian-text-cleaning, text-cleaner
- Language: Python
- Homepage: https://pypi.org/project/khoshnevis/
- Size: 21.5 KB
- Stars: 5
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Khoshnevis (خوشنويس)
====Python package for **normalizing** Persian text.
+ Text Cleaning
+ URL Remover
+ Emoji Remover
+ Text Tokenization
+ Punctuation Space Correction
+ Half Space Correction (using [Parsivar](https://github.com/ICTRC/Parsivar))
+ Standardize Alphabet
+ [NLTK](http://nltk.org/) compatible
+ Python 3 support## Usage
```python
>>> from khoshnevis import Normalizer>>> normalizer = Normalizer()
>>> normalizer.normalize(text="استفاده از نیمفاصله متن را زیبا مي كند", zwnj="\u200c",
clean_url=False, remove_emoji=False)
``````bibtex
text (str): input text
zwnj (str, optional): Zero-width non-joiner character. Defaults to "\u200c".
clean_url (bool, optional): removes all URLs from text. Defaults to True.
remove_emoji (bool, optional): removes all emojis from the text. Defaults to True.
```## Installation
The latest stable version of Hazm can be installed through `pip`:pip install khoshnevis
## Citation info
```bibtex
@misc{khoshnevis,
author = {HamidReza Attar, Milad Lotfi, Saied Alimoradi},
title = {Khoshnevis, a Python library for Persian text preprocessing},
year = {2022},
url= {https://www.khodnevisai.com/},
}
```