https://github.com/minsithu/burmese-microbiology-1k
Microbiology 1K QA pairs in Burmese Language
https://github.com/minsithu/burmese-microbiology-1k
artificial-intelligence burmese-gpt dataset microbiology myanmar-language myanmargpt-movement nlp
Last synced: 3 months ago
JSON representation
Microbiology 1K QA pairs in Burmese Language
- Host: GitHub
- URL: https://github.com/minsithu/burmese-microbiology-1k
- Owner: MinSiThu
- Created: 2024-07-23T03:16:24.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-07-31T09:26:32.000Z (10 months ago)
- Last Synced: 2025-01-09T05:44:18.621Z (4 months ago)
- Topics: artificial-intelligence, burmese-gpt, dataset, microbiology, myanmar-language, myanmargpt-movement, nlp
- Homepage:
- Size: 1000 KB
- Stars: 5
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Burmese-Microbiology-1K
### **Min Si Thu, [[email protected]]([email protected])**
Microbiology 1K QA pairs in Burmese Language

### Purpose
Before this Burmese Clinical Microbiology 1K dataset, the open-source resources to train the Burmese Large Language Model in Medical fields were rare.
Thus, the high-quality dataset needs to be curated to cover medical knowledge for the development of LLM in the Burmese language.### Motivation
I found an old notebook in my box. The book was from 2019. It contained written notes on microbiology when I was a third-year medical student. Because of the need for Burmese language resources in medical fields, I added more facts, and more notes and curated a dataset on microbiology in the Burmese language.
### About
The dataset for microbiology in the Burmese language contains **1262 rows of instruction and output pairs in CSV format**.
The dataset mainly focuses on clinical microbiology foundational knowledge, abstracting basic facts on culture medium, microbes - bacteria, viruses, fungi, parasites, and diseases caused by these microbes.### Examples
- ငှက်ဖျားရောဂါဆိုတာ ဘာလဲ?,ငှက်ဖျားရောဂါသည် Plasmodium ကပ်ပါးကောင်ကြောင့် ဖြစ်ပွားသော အသက်အန္တရာယ်ရှိနိုင်သည့် သွေးရောဂါတစ်မျိုးဖြစ်သည်။ ၎င်းသည် ငှက်ဖျားခြင်ကိုက်ခြင်းမှတဆင့် ကူးစက်ပျံ့နှံ့သည်။
- Influenza virus အကြောင်း အကျဉ်းချုပ် ဖော်ပြပါ။,Influenza virus သည် တုပ်ကွေးရောဂါ ဖြစ်စေသော RNA ဗိုင်းရပ်စ် ဖြစ်သည်။ Orthomyxoviridae မိသားစုဝင် ဖြစ်ပြီး type A၊ B၊ C နှင့် D ဟူ၍ အမျိုးအစား လေးမျိုး ရှိသည်။
- Clostridium tetani ဆိုတာ ဘာလဲ,Clostridium tetani သည် မေးခိုင်ရောဂါ ဖြစ်စေသော gram-positive၊ anaerobic bacteria တစ်မျိုး ဖြစ်သည်။ မြေဆီလွှာတွင် တွေ့ရလေ့ရှိသည်။
- Onychomycosis ဆိုတာ ဘာလဲ?,Onychomycosis သည် လက်သည်း သို့မဟုတ် ခြေသည်းများတွင် ဖြစ်ပွားသော မှိုကူးစက်မှုဖြစ်သည်။ ၎င်းသည် လက်သည်း သို့မဟုတ် ခြေသည်းများကို ထူထဲစေပြီး အရောင်ပြောင်းလဲစေသည်။
### Where to download the dataset
- Github - [https://github.com/MinSiThu/Burmese-Microbiology-1K/blob/main/data/Microbiology.csv](https://github.com/MinSiThu/Burmese-Microbiology-1K/blob/main/data/Microbiology.csv)
- Zenodo [https://zenodo.org/records/12803638](https://zenodo.org/records/12803638)
- Hugginface [https://huggingface.co/datasets/jojo-ai-mst/Burmese-Microbiology-1K](https://huggingface.co/datasets/jojo-ai-mst/Burmese-Microbiology-1K)
- Kaggle - [https://www.kaggle.com/datasets/minsithu/burmese-microbiology-1k](https://www.kaggle.com/datasets/minsithu/burmese-microbiology-1k)
### Applications
Burmese Microbiology 1K Dataset can be used in building various medical-related NLP applications.
- The dataset can be used for pretraining or finetuning the dataset on Burmese Large Langauge Models.
- The dataset is ready to use in building RAG-based Applications.### Acknowledgments
Special thanks to [magickospace.org](magickospace.org) for supporting the curation process of **Burmese Microbiology 1K Dataset**.
### References for this dataset
- [https://openstax.org/details/books/microbiology](https://openstax.org/details/books/microbiology) - For medical facts
- [https://moh.nugmyanmar.org/my/](https://moh.nugmyanmar.org/my/) - For burmese words for disease names
- [https://myordbok.com/dictionary/english](https://myordbok.com/dictionary/english) - English-Myanmar Translation Dictionary
### License - **[CC BY SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/)**
### How to cite the dataset
```txt
Si Thu, M. (2024). Burmese MicroBiology 1K Dataset (1.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.12803638Si Thu, Min, Burmese-Microbiology-1K (July 24, 2024). Available at SSRN: https://ssrn.com/abstract=4904320
```