Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/AshkanArabim/persian-word-extractor
This script creates a list of unique words from Persian text. Words can be sorted by frequency or alphabetical order. This is a new project, there could be major bugs in the code.
https://github.com/AshkanArabim/persian-word-extractor
persian python3 word-extraction
Last synced: about 2 months ago
JSON representation
This script creates a list of unique words from Persian text. Words can be sorted by frequency or alphabetical order. This is a new project, there could be major bugs in the code.
- Host: GitHub
- URL: https://github.com/AshkanArabim/persian-word-extractor
- Owner: AshkanArabim
- Created: 2022-03-17T15:45:41.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-05-27T23:00:18.000Z (over 1 year ago)
- Last Synced: 2024-08-04T09:05:39.259Z (5 months ago)
- Topics: persian, python3, word-extraction
- Language: Python
- Homepage:
- Size: 1.55 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## persian-word-extractor
This script creates a list of unique words from Persian text. Words are sorted by the frequency that they appear in the source.txt file. This is a new project, there could be major bugs in the code.
**Words with accent marks are excluded from results.**## Features:
* sort by frequency or alphabetical order* extract words from source.txt or online links
## How to use:
1. Create a file named 'source.txt' in root directory and paste source text inside.
2. Run 'main.py'
4. Follow CLI instructions.
5. Results will be written to 'output.txt' in root directory.**Feel free to tweak the code to suit your needs.**
## How did I use it?
I ran this script on a large body of Persian text to extract words for contribution to [Monkeytype](https://monkeytype.com). I added the "Persian 1k" & "Persian 5k" tests. My first open-source contribution!!