Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/slinusc/path-vqa-blip

Fine-tuning BLIP for pathological visual question answering.
https://github.com/slinusc/path-vqa-blip

blip multimodal-deep-learning pathology

Last synced: 5 days ago
JSON representation

Fine-tuning BLIP for pathological visual question answering.

Host: GitHub
URL: https://github.com/slinusc/path-vqa-blip
Owner: slinusc
License: mit
Created: 2024-05-20T17:59:53.000Z (6 months ago)
Default Branch: main
Last Pushed: 2024-06-30T15:56:54.000Z (5 months ago)
Last Synced: 2024-07-03T14:18:09.970Z (4 months ago)
Topics: blip, multimodal-deep-learning, pathology
Language: Jupyter Notebook
Homepage: https://huggingface.co/slinusc/path-vqa-blip
Size: 80.1 KB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

### Abstract

This project fine-tunes the BLIP (Bootstrapping Language-Image Pre-training) model for pathological Visual Question Answering (VQA) to improve accuracy in pathological yes/no questions. Utilizing the PathVQA dataset with 32,799 question-answer pairs from 4,998 pathology images, the model was trained using the AdamW optimizer, learning rate scheduling, and mixed precision training. Hyperparameter optimization via Optuna led to significant performance improvements: accuracy rose from 0.5164 to 0.8554, precision from 0.5344 to 0.8560, recall from 0.8122 to 0.8805, and F1 score from 0.6447 to 0.8681.