{"id":50937984,"url":"https://github.com/simula/kvasir-vqa-x1","last_synced_at":"2026-06-17T11:03:34.834Z","repository":{"id":311904961,"uuid":"1000335396","full_name":"simula/Kvasir-VQA-x1","owner":"simula","description":"Official repository for the Kvasir-VQA-x1 paper","archived":false,"fork":false,"pushed_at":"2025-11-03T12:55:12.000Z","size":74,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-03T14:26:51.192Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2506.09958","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/simula.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-11T16:07:48.000Z","updated_at":"2025-11-03T12:55:15.000Z","dependencies_parsed_at":"2025-08-27T19:48:13.635Z","dependency_job_id":"f138ab04-5185-488c-81db-20f4972dbb1b","html_url":"https://github.com/simula/Kvasir-VQA-x1","commit_stats":null,"previous_names":["simula/kvasir-vqa-x1"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/simula/Kvasir-VQA-x1","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2FKvasir-VQA-x1","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2FKvasir-VQA-x1/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2FKvasir-VQA-x1/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2FKvasir-VQA-x1/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/simula","download_url":"https://codeload.github.com/simula/Kvasir-VQA-x1/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2FKvasir-VQA-x1/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34445186,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-17T02:00:05.408Z","response_time":127,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-17T11:03:20.990Z","updated_at":"2026-06-17T11:03:34.821Z","avatar_url":"https://github.com/simula.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Kvasir-VQA-x1\n\n\nA Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy\n\n[Dataset on Hugging Face](https://huggingface.co/datasets/SimulaMet/Kvasir-VQA-x1)  \n[Original Image Download (Simula Datasets)](https://datasets.simula.no/kvasir-vqa/) or see below.\n\n\u003e 🔗 [MediaEval Medico 2025 Challenge](https://github.com/simula/MediaEval-Medico-2025) uses this dataset. We encourage you to check out and participate!\n\n---\n\n🚧 **Work in Progress**  \nThis repository is under active development. The training, evaluation code will be released soon.\n\nIf you urgently need access or have questions, please contact:  \n📧 **sushant@simula.no**\n\n---\n\n## 🧠 About\n\n**Kvasir-VQA-x1** is a multimodal dataset aimed at advancing medical visual question answering (MedVQA) in GI endoscopy. We build on the original [Kvasir-VQA](https://datasets.simula.no/kvasir-vqa/) by adding 159,549 new QA pairs with richer reasoning and complexity stratification.\n\nThis repo provides:\n\n- Augmentation scripts\n- Dataset generation code\n- JSON validators\n- Sample training/evaluation workflows\n- Metric visualizations (e.g., radar plots)\n\n## Models trained in this work:\n| Model | HF Repo | W\u0026B Logs |\n|:--|:--|:--|\n| Qwen2.5-VL-KvasirVQA-x1-ft | [HF](https://huggingface.co/SimulaMet/Qwen2.5-VL-KvasirVQA-x1-ft) | [W\u0026B 7mk4gz8s](https://wandb.ai/ubl/Kvasir-VQA-x1/runs/7mk4gz8s) |\n| Qwen2.5-VL-Transf-KvasirVQA-x1-ft | [HF](https://huggingface.co/SimulaMet/Qwen2.5-VL-Transf-KvasirVQA-x1-ft) | [W\u0026B megwnbz6](https://wandb.ai/ubl/Kvasir-VQA-x1/runs/megwnbz6) |\n| MedGemma-KvasirVQA-x1-ft | [HF](https://huggingface.co/SimulaMet/MedGemma-KvasirVQA-x1-ft) | [W\u0026B 7mk4gz8s](https://wandb.ai/ubl/Kvasir-VQA-x1/runs/7mk4gz8s) |\n\n## 🖼️ Usage Example\n\n```python\n!pip install ms-swift==3.8.0 bitsandbytes qwen_vl_utils==0.0.11\n\nimport torch\nfrom swift.llm import PtEngine, RequestConfig, InferRequest\nfrom transformers import BitsAndBytesConfig\n\nbnb_config = BitsAndBytesConfig(\n    load_in_4bit=True,\n    bnb_4bit_quant_type=\"nf4\",\n    bnb_4bit_use_double_quant=True,\n    bnb_4bit_compute_dtype=torch.float16\n)\n\nengine = PtEngine(\n    adapters=[\"SimulaMet/Qwen2.5-VL-KvasirVQA-x1-ft\"],  # or use other fine-tuned model IDs\n    model_id_or_path=\"Qwen/Qwen2.5-VL-7B-Instruct\",  # or use other base model IDs\n    quantization_config=bnb_config,\n    attn_impl=\"sdpa\",\n    use_hf=True,\n)\n\nreq_cfg = RequestConfig(max_tokens=512, temperature=0.3, top_k=20, top_p=0.7, repetition_penalty=1.05)\n\ninfer_requests = [\n    InferRequest(messages=[{\n        \"role\": \"user\",\n        \"content\": [\n            {\"type\": \"image\", \"image\": \"https://huggingface.co/datasets/SimulaMet/Kvasir-VQA-x1/resolve/main/images/clb0kvxvm90y4074yf50vf5nq.jpg\"},\n            {\"type\": \"text\", \"text\": \"What is shown in the image?\"}\n        ],\n    }])\n]\n\nresp = engine.infer(infer_requests, req_cfg)\nprint(resp[0].choices[0].message.content)\n```\n\n👉 See detailed examples in the [Colab usage notebook](https://colab.research.google.com/github/Simula/Kvasir-VQA-x1/blob/main/notebooks/usage.ipynb).\n\n\n## 🧾 Dataset Structure\n\nEach sample includes:\n\n| Field           | Description |\n|----------------|-------------|\n| `img_id`        | Image reference from Kvasir-VQA |\n| `complexity`    | Reasoning complexity (1–3) |\n| `question`      | Natural language QA |\n| `answer`        | Human-validated clinical answer |\n| `original`      | Source atomic QA pairs |\n| `question_class`| Clinical categories (e.g., polyp type) |\n\nSee full dataset: [Hugging Face page](https://huggingface.co/datasets/SimulaMet/Kvasir-VQA-x1)\n\n## 🧪 Evaluation Tracks\n\n- **Standard**: QA on original images  \n- **Transformed**: QA on visually perturbed images (augmented via scripts here)\n\n## 📥 Download \u0026 Prepare Images\n\nTo ensure reproducibility, you can download the **original images** and generate **augmented (perturbed) images** locally.\n\n---\n\n### 1️⃣ Download Original Images\n\n```python\nfrom datasets import load_dataset\nfrom pathlib import Path\nfrom tqdm import tqdm\nimport os, json\n\n# Output folder\nd_path = \"./Kvasir-VQA-x1/\"\nimg_dir = Path(os.path.abspath(os.path.join(d_path, \"images\")))\nimg_dir.mkdir(exist_ok=True, parents=True)\n\n# Download original images once from SimulaMet-HOST/Kvasir-VQA\nds_host = load_dataset(\"SimulaMet-HOST/Kvasir-VQA\", split=\"raw\")\n_, idx = np.unique(ds_host[\"img_id\"], return_index=True)\nds = ds.select(sorted(idx))\nexisting = set(p.stem for p in img_dir.glob(\"*.jpg\"))\nfor row in tqdm(ds, desc=\"Saving unique images\"):\n    if row[\"img_id\"] in existing: \n        continue\n    row[\"image\"].save(img_dir / f\"{row['img_id']}.jpg\")\n\n# Save VLM-ready JSONLs (pointing to ORIGINAL images)\nfor split in [\"train\", \"test\"]:\n    with open(f\"{d_path}/Kvasir-VQA-x1-{split}.jsonl\", \"w\", encoding=\"utf-8\") as f:\n        for r in load_dataset(\"SimulaMet/Kvasir-VQA-x1\", split=split):\n            f.write(json.dumps({\n                \"messages\": [\n                    {\"role\": \"user\", \"content\": f\"\u003cimage\u003e{r['question']}\"},\n                    {\"role\": \"assistant\", \"content\": r[\"answer\"]}\n                ],\n                \"images\": [str(img_dir / f\"{r['img_id']}.jpg\")]\n            }, ensure_ascii=False) + \"\\n\")\n```\n\n---\n\n### 2️⃣ Generate Weakly-Augmented Images\n\nThis script saves **lightly perturbed versions** of each image and creates JSONLs pointing to them.\n\n```python\nfrom datasets import load_dataset, Image as HfImage\nfrom pathlib import Path\nfrom PIL import Image\nimport torchvision.transforms as T\nfrom torchvision.transforms import InterpolationMode as IM\nimport numpy as np, os, random, json, torch\n\nSEED = 42\nrandom.seed(SEED); np.random.seed(SEED); torch.manual_seed(SEED)\n\n# Paths\nd_path = Path(\"./Kvasir-VQA-x1\")\naug_dir = (d_path / \"image_weak_augmented\").resolve()\naug_dir.mkdir(parents=True, exist_ok=True)\n\n# Define weak augmentation\nweak = lambda img: T.Compose([\n    T.RandomResizedCrop(img.size[::-1], scale=(0.9,1.0), ratio=(img.size[0]/img.size[1]*0.95, img.size[0]/img.size[1]*1.05), interpolation=IM.BICUBIC),\n    T.RandomRotation((-10,10), interpolation=IM.BICUBIC, fill=0),\n    T.RandomAffine(0, translate=(0.1,0.1), interpolation=IM.BICUBIC, fill=0),\n    T.ColorJitter(0.2,0.2)\n])(img)\n\n# Work on unique images\nds_aug = {}\nfor split in [\"train\", \"test\"]:\n    ds = load_dataset(\"SimulaMet/Kvasir-VQA-x1\", split=split).cast_column(\"image\", HfImage())\n\n    # keep unique img_id\n    uniq_idx = sorted(np.unique(ds[\"img_id\"], return_index=True)[1])\n    ds_unique = ds.select(uniq_idx)\n\n    # augment and save\n    def save_img_batch(batch):\n        return {\"weak_image\":[\n            (weak(img.convert(\"RGB\")).save(p) or p) if not os.path.exists(p) else p\n            for img,p in zip(batch[\"image\"], [str(aug_dir / f\"{i}.jpg\") for i in batch[\"img_id\"]])\n        ]}\n    ds_unique = ds_unique.map(save_img_batch, batched=True, batch_size=10, num_proc=4)\n\n    # cast new column as HfImage\n    ds_aug[split] = ds_unique.cast_column(\"weak_image\", HfImage())\n\n# Now you can access ad dataset object with:\nds_train_aug = ds_aug[\"train\"]\nds_test_aug  = ds_aug[\"test\"]\n```\n\n---\n\n### 3️⃣ Export JSONLs with Augmented Images\n\n```python\n# Save VLM-ready JSONLs pointing to AUGMENTED images\nfor split in [\"train\", \"test\"]:\n    out_path = f\"{d_path}/Kvasir-VQA-x1-{split}-aug.jsonl\"\n    with open(out_path, \"w\", encoding=\"utf-8\") as f:\n        for r in load_dataset(\"SimulaMet/Kvasir-VQA-x1\", split=split):\n            f.write(json.dumps({\n                \"messages\": [\n                    {\"role\": \"user\", \"content\": f\"\u003cimage\u003e{r['question']}\"},\n                    {\"role\": \"assistant\", \"content\": r[\"answer\"]}\n                ],\n                \"images\": [str(aug_dir / f\"{r['img_id']}.jpg\")]\n            }, ensure_ascii=False) + \"\\n\")\n```\n\n---\n\n✅ With this, you’ll have both:\n\n- `Kvasir-VQA-x1-{train,test}.jsonl` → pointing to **original** images  \n- `Kvasir-VQA-x1-{train,test}-aug.jsonl` → pointing to **weakly augmented** images  \n\n---\n\n## 📜 License\n\nThis dataset is released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).\n\n## 📌 Citation\n\nPlease cite the associated dataset paper if you use Kvasir-VQA-x1 in your work:\n\n```bibtex\n\n@incollection{Gautam2025Oct,\n  author={Gautam, Sushant and Riegler, Michael and Halvorsen, P{\u0007a}l},\n  title={Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy},\n  booktitle={Data Engineering in Medical Imaging},\n  year={2025},\n  publisher={Springer, Cham},\n  doi={10.1007/978-3-032-08009-7_6}\n}\n\n@article{Gautam2025Jun,\n\tauthor = {Gautam, Sushant and Riegler, Michael A. and Halvorsen, P{\\aa}l},\n\ttitle = {{Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy}},\n\tjournal = {arXiv},\n\tyear = {2025},\n\tmonth = jun,\n\teprint = {2506.09958},\n\tdoi = {10.48550/arXiv.2506.09958}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimula%2Fkvasir-vqa-x1","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsimula%2Fkvasir-vqa-x1","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimula%2Fkvasir-vqa-x1/lists"}