Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/schoobani/persian-generative-chatbot
A repo dedicated to different approaches in building a Persian Generative Chatbot.
https://github.com/schoobani/persian-generative-chatbot
chatbot persian text-generation
Last synced: about 2 months ago
JSON representation
A repo dedicated to different approaches in building a Persian Generative Chatbot.
- Host: GitHub
- URL: https://github.com/schoobani/persian-generative-chatbot
- Owner: schoobani
- Created: 2022-07-16T10:05:24.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2022-09-07T18:58:23.000Z (over 2 years ago)
- Last Synced: 2024-10-26T23:53:50.696Z (3 months ago)
- Topics: chatbot, persian, text-generation
- Language: Jupyter Notebook
- Homepage:
- Size: 507 KB
- Stars: 9
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### Persian Generative Chatbot
A repo dedicated to different approaches in building a Persian Generative Chatbot.
### Data
We need a dataset of conversation pairs to build a chatbot.
### 1- Ninisite
[ninisite](ninisite.com) is a persian forum with millions of conversation pairs on different life-style topics. We wrote a [simple script](/data/ninisite/) to crawl the conversation pairs, like the following sample:
```
{
"topic": "بیمارستان هدایت تهران برای زایمان چطوره؟",
"question": [
"سلام خانما",
"من تو بیمارستان میلاد برای بارداری و زایمان پرونده دارم.اما بیمار کرونایی بستری داره و فعلا هم تعطیله.",
"شنیدم بیمارستان هدایت فقط زنان و زایمانه؟کسی چکاپ های بارداری و زایمانش رو تو هدایت انجام داده؟راضی بودین؟",
"لطفا راهنماییم کنید"
],
"answer": [
"آره من 26سال پیش اونجا بدنیا اومدم مامانم ک خیلی تعریف میکرد ."
]
}
```The following table shows some stats on the process of crawling the dataset (The table is getting updatad constantly).
| Total Targeted Topics | Crawled | Crawled Conversation Pairs | Size |
| :-------------------: | :--------: | :------------------------: | :----: |
| 636921 | 48708 (7%) | 1221025 | 778 MB |