{"id":50938086,"url":"https://github.com/simula/pointdetectcount","last_synced_at":"2026-06-17T11:03:44.154Z","repository":{"id":299781880,"uuid":"981759335","full_name":"simula/PointDetectCount","owner":"simula","description":null,"archived":false,"fork":false,"pushed_at":"2025-06-18T08:26:49.000Z","size":8,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-18T09:28:40.628Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/simula.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-11T20:23:18.000Z","updated_at":"2025-06-18T08:26:53.000Z","dependencies_parsed_at":"2025-06-18T09:41:36.593Z","dependency_job_id":null,"html_url":"https://github.com/simula/PointDetectCount","commit_stats":null,"previous_names":["simula/pointdetectcount"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/simula/PointDetectCount","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2FPointDetectCount","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2FPointDetectCount/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2FPointDetectCount/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2FPointDetectCount/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/simula","download_url":"https://codeload.github.com/simula/PointDetectCount/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simula%2FPointDetectCount/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34445186,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-17T02:00:05.408Z","response_time":127,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-17T11:03:42.808Z","updated_at":"2026-06-17T11:03:44.147Z","avatar_url":"https://github.com/simula.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# 🩺 PointDetectCount: Multi-Task Medical Image Understanding with Instruction-Tuned Vision-Language Models\n\nThis repository contains the code and data generation scripts used in the paper:\n\n**[Point, Detect, Count: Multi-Task Medical Image Understanding with Instruction-Tuned Vision-Language Models](https://arxiv.org/html/2505.16647v1)**  \n`Sushant Gautam, Michael A. Riegler, Pål Halvorsen`  \n*arXiv preprint, May 2025*\n\n---\n\n## 📌 Overview\n\nPointDetectCount is a unified multi-task framework for fine-tuning instruction-tuned vision-language models (VLMs) on three fundamental medical imaging tasks:\n\n- **Pointing (Localization)**\n- **Bounding Box Detection**\n- **Counting (Object Enumeration)**\n\nThe model is trained and evaluated on the [MedMultiPoints](https://huggingface.co/datasets/SimulaMet/MedMultiPoints) dataset, a multimodal dataset comprising diverse clinical annotations.\n\n---\n\n## 📦 Dataset\n\nDataset is available via Hugging Face:\n👉 [`SimulaMet/MedMultiPoints`](https://huggingface.co/datasets/SimulaMet/MedMultiPoints)\n\nAll raw images should be stored locally in the `MedMultiPoints-images/` directory.\n\n### Download Images Locally\n\nYou can download the image files directly from the Hugging Face dataset using the\n[`datasets`](https://github.com/huggingface/datasets) library:\n\n```python\nfrom datasets import load_dataset\n\n# Load the dataset\nds = load_dataset(\"SimulaMet/MedMultiPoints\")\n\n# Path to save images and a metadata file\noutput_dir = \"MedMultiPoints-images\"\n\nimport os\nos.makedirs(output_dir, exist_ok=True)\n\n# Save one image per unique hash\nfor sha, row in ds[\"train\"].to_pandas().groupby(\"image_sha256\").nth(0).iterrows():\n    row[\"image_data\"].save(os.path.join(output_dir, f\"{sha}.jpg\"))\n```\n\nThis snippet creates the `MedMultiPoints-images/` folder (if it doesn't already\nexist) and writes each image from the dataset to that directory using the image's\nSHA-256 hash as the filename.\n\n| Columns              | Type         | Description                                                       |\n|-------------------|--------------|-------------------------------------------------------------------|\n| `image`           | Image        | Raw medical image                                                 |\n| `image_sha256`    | string       | SHA-256 checksum for integrity                                    |\n| `img_size`        | `[int, int]` | Image dimensions: `[width, height]`                               |\n| `points`          | `[[x, y]]`   | List of point annotations                                         |\n| `bbox`            | `[[x1, y1, x2, y2]]` | List of bounding boxes                                   |\n| `count`           | int          | Number of annotated objects                                       |\n| `label`           | string       | Object class (e.g., polyp, sperm, cluster, etc.)                  |\n| `collection_method` | string     | Task relevance (e.g., detection, counting)                        |\n| `classification`  | string       | Free-form annotation description                                  |\n| `organ`           | string       | Organ or modality type (e.g., GI tract, sperm)                    |\n\n**Instruction-Fused JSONL Files**:\n\n- [`multi-task-train.jsonl`](https://huggingface.co/datasets/SimulaMet/MedMultiPoints/resolve/main/instruction_dataset/multi-task-train.jsonl)\n- [`multi-task-test.jsonl`](https://huggingface.co/datasets/SimulaMet/MedMultiPoints/resolve/main/instruction_dataset/multi-task-test.jsonl)\n\n---\n\n## 💾 Fine-Tuned Model\n\nModel weights are available via Hugging Face:\n👉 [`SimulaMet/PointDetectCount-Qwen2.5-VL-7B-LoRA`](https://huggingface.co/SimulaMet/PointDetectCount-Qwen2.5-VL-7B-LoRA)\n\n---\n\n## 🛠️ Repository Structure\n\n| File/Folder           | Description                                                              |\n|-----------------------|--------------------------------------------------------------------------|\n| `create_datasetJSON.py` | Generates instruction-formatted JSONL files for multi-task fine-tuning |\n| `evaluate_qwen.py`      | Evaluates VLM outputs against structured annotations (bbox, point, count) |\n| `MedMultiPoints-images/` | Directory to store dataset images locally |\n\n---\n\n## 🚀 Usage\n\n### Create Instruction Dataset\n\nRun the conversion script to produce an instruction-formatted dataset. Adjust the image directory or output path if needed:\n\n```bash\npython create_datasetJSON.py --image-dir MedMultiPoints-images --output kvasir_valid.jsonl\n```\n\n### Evaluate Predictions\n\nCompare your model's predictions with the provided ground truth using:\n\n```bash\npython evaluate_qwen.py --dataset kvasir_valid-qwen-6task-test.jsonl --results kvasir_valid-qwen-6task-test-result.jsonl\n```\n\n### Fine-Tune Qwen (LoRA)\n\nTraining uses the instruction-fused training file available at\n[`multi-task-train.jsonl`](https://huggingface.co/datasets/SimulaMet/MedMultiPoints/resolve/main/instruction_dataset/multi-task-train.jsonl):\n\n```bash\nswift sft --model Qwen/Qwen2.5-VL-7B-Instruct \\\n    --train_type lora \\\n    --dataset /home/sushant/D1/MIUA/kvasir-format/multi-task-train.jsonl \\\n    --output_dir /home/sushant/D1/MIUA/kvasir-format/training2 \\\n    --num_train_epochs 5 \\\n    --eval_steps 200 \\\n    --save_total_limit 3 \\\n    --report_to wandb \\\n    --per_device_train_batch_size 4\n```\n\n### Inference\n\nInfer using either the fine-tuned checkpoint or the original model:\n\n```bash\n# Finetuned model\nswift infer --model SimulaMet/PointDetectCount-Qwen2.5-VL-7B-LoRA \\\n    --val_dataset https://huggingface.co/datasets/SimulaMet/MedMultiPoints/resolve/main/instruction_dataset/multi-task-test.jsonl \\\n    --result_path qwen_outputs/qwen-finetuned-6task-test500-result.jsonl \\\n    --use_hf true\n\n# Public checkpoint\nswift infer --model Qwen/Qwen2.5-VL-7B-Instruct \\\n    --val_dataset https://huggingface.co/datasets/SimulaMet/MedMultiPoints/resolve/main/instruction_dataset/multi-task-test.jsonl \\\n    --result_path qwen_outputs/qwen-public-6task-test500-result.jsonl \\\n    --use_hf true\n```\n\n---\n\n## 🧠 Methodology Summary\n\nWe fine-tune [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) using [LoRA](https://arxiv.org/abs/2106.09685) for instruction-based multi-task image understanding.\n\n- Each image is associated with 5 instruction-response pairs.\n- Responses are expected to be JSON-formatted predictions.\n- Tasks are trained jointly using commonly used language modeling loss.\n\nFor more details, see [Section IV of the paper](https://arxiv.org/html/2505.16647v1#S4).\n\n---\n\n## 📊 Evaluation Metrics\n\n| Task             | Metrics (Key)                                  |\n|------------------|------------------------------------------------|\n| **Counting**     | MAE, MSE                                       |\n| **Pointing**     | Point MAE, RMSE, Matching Accuracy, Zero-cases |\n| **Bounding Box** | mAP, mAP@50, mAP@75, IoU                       |\n\nEvaluation scripts are provided in `evaluate_qwen.py`.\n\n---\n\n## 📝 Citation\n\nIf you use this work, please cite:\n\n```bibtex\n@incollection{Gautam,\n    author = {Gautam, Sushant and Riegler, Michael A. and Halvorsen, P{\\aa}l},\n    title = {{Point, Detect, Count: Multi-Task Medical Image Understanding with Instruction-Tuned Vision-Language Models}},\n    booktitle = {{2025 IEEE 38th International Symposium on Computer-Based Medical Systems (CBMS)}},\n    journal = {Published in: 2025 IEEE 38th International Symposium on Computer-Based Medical Systems (CBMS)},\n    pages = {18--20},\n    publisher = {IEEE},\n    doi = {10.1109/CBMS65348.2025.00090}\n}\n```\n\n---\n\n## 📬 Contact\n\nFor questions or collaboration inquiries, reach out to:\n\n📧 sushant@simula.no\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimula%2Fpointdetectcount","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsimula%2Fpointdetectcount","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimula%2Fpointdetectcount/lists"}