https://github.com/sayamalt/symptoms-disease-text-classification
Successfully developed a fine-tuned BERT transformer model which can accurately classify symptoms to their corresponding diseases upto an accuracy of 89%.
https://github.com/sayamalt/symptoms-disease-text-classification
bert-fine-tuning data-exploration-and-preprocessing exploratory-data-analysis fine-tune-bert-tensorflow hugging-face-transformers model-architecture-and-implementation model-inference model-training-and-evaluation multiclass-classification natural-language-processing text-classification text-preprocessing text-tokenization
Last synced: 4 months ago
JSON representation
Successfully developed a fine-tuned BERT transformer model which can accurately classify symptoms to their corresponding diseases upto an accuracy of 89%.
- Host: GitHub
- URL: https://github.com/sayamalt/symptoms-disease-text-classification
- Owner: SayamAlt
- License: apache-2.0
- Created: 2024-05-06T05:33:16.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-05-06T06:35:19.000Z (about 1 year ago)
- Last Synced: 2024-12-28T08:09:34.803Z (6 months ago)
- Topics: bert-fine-tuning, data-exploration-and-preprocessing, exploratory-data-analysis, fine-tune-bert-tensorflow, hugging-face-transformers, model-architecture-and-implementation, model-inference, model-training-and-evaluation, multiclass-classification, natural-language-processing, text-classification, text-preprocessing, text-tokenization
- Language: Jupyter Notebook
- Homepage:
- Size: 860 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## About Dataset
The dataset consists of 1200 datapoints and has two columns: "label" and "text".
- label : contains the disease labels
- text : contains the natural language symptom descriptions.
The dataset comprises 24 different diseases, and each disease has 50 symptom descriptions, resulting in a total of 1200 datapoints.
The following 24 diseases have been covered in the dataset:
Psoriasis, Varicose Veins, Typhoid, Chicken pox, Impetigo, Dengue, Fungal infection, Common Cold, Pneumonia, Dimorphic Hemorrhoids, Arthritis, Acne, Bronchial Asthma, Hypertension, Migraine, Cervical spondylosis, Jaundice, Malaria, urinary tract infection, allergy, gastroesophageal reflux disease, drug reaction, peptic ulcer disease, diabetes
## Task
The task is to develop a language model to accurately predict the disease given a short description of the symptoms faced by the user.
Such models can be used to identify potential diseases early on, allowing patients to seek medical attention and treatment promptly. Also, In situations where in-person consultations are not possible or desirable, the app can be used to provide remote diagnosis and treatment recommendations based on the user's symptoms.