Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nabidam/persian-names
Persian names dataset
https://github.com/nabidam/persian-names
dataset farsi farsi-datasets iran json persian
Last synced: 3 months ago
JSON representation
Persian names dataset
- Host: GitHub
- URL: https://github.com/nabidam/persian-names
- Owner: nabidam
- License: mit
- Created: 2024-01-25T10:11:40.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-01-25T17:13:27.000Z (10 months ago)
- Last Synced: 2024-06-06T10:33:59.540Z (5 months ago)
- Topics: dataset, farsi, farsi-datasets, iran, json, persian
- Language: Python
- Homepage:
- Size: 267 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Persian Names
A json file contains 8816 persian name with their gender (some of them is not labeled)
# اسامی ایرانی
یک فایل json که حاوی ۸۸۱۶ نام ایرانی به همراه جنسیت آنها (برخی از اسامی بدون برچسب جنسیت هستند) میباشد.
## Resource (منابع)
-
-## Description
I just download 2 xls and csv files from the internet, then preprocessed them to make the data of the two files compatible and finally merge them into one json file.
### Preprocess
1. removeing arabic standard characters that are not in persian words, like `ئ` or `ك`
2. removing arabic irabs from words, like `ـَـِـُ`## توضیحات
بنده دو فایل اکسل و csv که از اینترنت پیدا کردم را ابتدا با استفاده از پردازشهای ساده با یکدیگر سازگار کرده سپس دادههای دو فایل را با هم ادغام کردم.
### پیشپردازشها
1. حذف حروف استاندارد عربی مانند `ئ` و `ك`
2. حذف اعراب از اسامی مانند `ـَـِـُ`## Todos
- [x] parse and merge two name files
- [ ] add missing genders to rest of the names
- [ ] add names latin version to the results
- [ ] add names pronounciations (maybe in audio format) to the results