https://github.com/sawyerbutton/medical-rumor-explanation

Medical Rumor Explanation Pipeline
https://github.com/sawyerbutton/medical-rumor-explanation

Last synced: over 1 year ago
JSON representation

Medical Rumor Explanation Pipeline

Host: GitHub
URL: https://github.com/sawyerbutton/medical-rumor-explanation
Owner: sawyerbutton
License: apache-2.0
Created: 2023-07-29T03:41:13.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2023-07-31T01:28:34.000Z (almost 3 years ago)
Last Synced: 2025-02-07T04:40:46.312Z (over 1 year ago)
Language: Jupyter Notebook
Size: 7.92 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Medical Rumor Detection Pipeline

## Background

1. The analyzed Rumor file contains 125 valuable data entries.
2. Each data entry spans 5 dimensions: Time, Rumor Nature, Rumor Content Tag, Counter-rumor Explanation, Rumor Summary, and Original Rumor Text.
3. The Huatuo model, fine-tuned for this task, is a model adjusted from the LLaMA model enriched with Chinese medical knowledge. This was accomplished by building a Chinese medical command dataset using a medical knowledge graph and the GPT-3.5 API. The LLaMA was then fine-tuned on these commands, enhancing its performance in medical Q&A scenarios.

### Data analysis

1. Rumor Nature and Rumor Content Tag might be largely irrelevant.
2. Time could be used for logical validation.
3. In most cases, the Rumor Summary is a correct abstraction of the Original Rumor Text. To save on tokenization, the Rumor Summary can be used as the context for subsequent fine-tuning steps.

## Purpose

1. The goal is to train a model suitable for analyzing medical rumors. It should not only categorize the rumors but also provide a level of confidence in its judgment (based on a certain threshold).

## Procedure

1. Construct a fine-tuning dataset that meets the requirements of the Huatuo model from the existing data. The current dataset is limited; additional data scraping/creation is required.
2. Fine-tune the Huatuo model using Lora.
3. Validate the capabilities of the fine-tuned Huatuo model (accuracy across two dimensions: Counter-rumor Explanation and Rumor Judgment Confidence).

## Todos

- [ ] Scrape medical rumors.
- [x] Build a dataset for fine-tuning the Huatuo model.
- [ ] Construct serveral few-shot finetune solutions for medical rumor.
- [x] Using ChatGLM2-6B model to build a result baseline
- [ ] Set up the base Huatuo model.
- [ ] Verify the rumor analysis capabilities and accuracy of the Huatuo model.
- [ ] Fine-tune the Huatuo model using the dataset.
- [ ] Validate the capabilities of the fine-tuned Huatuo model.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sawyerbutton/medical-rumor-explanation

Awesome Lists containing this project

README