Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/motazsaad/process-arabic-text
Pre-process arabic text (remove diacritics, punctuations and repeating characters)
https://github.com/motazsaad/process-arabic-text
arabic-nlp punctuation remove-diacritics
Last synced: 3 months ago
JSON representation
Pre-process arabic text (remove diacritics, punctuations and repeating characters)
- Host: GitHub
- URL: https://github.com/motazsaad/process-arabic-text
- Owner: motazsaad
- License: gpl-3.0
- Created: 2017-03-17T17:06:17.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2017-04-08T17:36:23.000Z (over 7 years ago)
- Last Synced: 2024-05-11T15:35:34.291Z (6 months ago)
- Topics: arabic-nlp, punctuation, remove-diacritics
- Language: Python
- Homepage:
- Size: 16.6 KB
- Stars: 102
- Watchers: 7
- Forks: 39
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-arabic-nlp - motazsaad/process-arabic-text
README
# Pre-process Arabic Text
Pre-process arabic text (remove diacritics, punctuations, and repeating characters)## Usage:
```
Usage: clean_arabic_text.py [-h] -i INFILE -o OUTFILEPre-process arabic text (remove diacritics, punctuations, and repeating
characters).optional arguments:
-h, --help show this help message and exit
-i INFILE, --infile INFILE
input file.
-o OUTFILE, --outfile OUTFILE
out file.
```## Example
```
python clean_arabic_text.py -i infile.txt -o outfile.txt
```## How to contribute
Your contributions to improve the code are welcomed. Please follow the steps below.
1. Fork the project.
2. Modify the code, test it, make sure that it works fine.
3. Make a pull request.Please consult [github help](https://help.github.com/) to get help.