Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/slinusc/path-vqa-blip
Fine-tuning BLIP for pathological visual question answering.
https://github.com/slinusc/path-vqa-blip
blip multimodal-deep-learning pathology
Last synced: 5 days ago
JSON representation
Fine-tuning BLIP for pathological visual question answering.
- Host: GitHub
- URL: https://github.com/slinusc/path-vqa-blip
- Owner: slinusc
- License: mit
- Created: 2024-05-20T17:59:53.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-06-30T15:56:54.000Z (5 months ago)
- Last Synced: 2024-07-03T14:18:09.970Z (4 months ago)
- Topics: blip, multimodal-deep-learning, pathology
- Language: Jupyter Notebook
- Homepage: https://huggingface.co/slinusc/path-vqa-blip
- Size: 80.1 KB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
### Abstract
This project fine-tunes the BLIP (Bootstrapping Language-Image Pre-training) model for pathological Visual Question Answering (VQA) to improve accuracy in pathological yes/no questions. Utilizing the PathVQA dataset with 32,799 question-answer pairs from 4,998 pathology images, the model was trained using the AdamW optimizer, learning rate scheduling, and mixed precision training. Hyperparameter optimization via Optuna led to significant performance improvements: accuracy rose from 0.5164 to 0.8554, precision from 0.5344 to 0.8560, recall from 0.8122 to 0.8805, and F1 score from 0.6447 to 0.8681.