{"id":27963251,"url":"https://github.com/nyumedml/headct_foundation","last_synced_at":"2026-03-17T18:37:30.987Z","repository":{"id":276060436,"uuid":"898794816","full_name":"NYUMedML/headCT_foundation","owner":"NYUMedML","description":"Foundation 3D ViT model for volumetric head CT","archived":false,"fork":false,"pushed_at":"2025-04-16T02:40:25.000Z","size":54627,"stargazers_count":39,"open_issues_count":1,"forks_count":4,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-05-07T19:58:27.924Z","etag":null,"topics":["computer-vision","computerized-tomography","foundation-model","self-supervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NYUMedML.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-05T03:40:35.000Z","updated_at":"2025-04-16T17:38:00.000Z","dependencies_parsed_at":null,"dependency_job_id":"88efbb03-622e-4364-9cec-327f8406f95e","html_url":"https://github.com/NYUMedML/headCT_foundation","commit_stats":null,"previous_names":["nyumedml/headct_foundation"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NYUMedML%2FheadCT_foundation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NYUMedML%2FheadCT_foundation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NYUMedML%2FheadCT_foundation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NYUMedML%2FheadCT_foundation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NYUMedML","download_url":"https://codeload.github.com/NYUMedML/headCT_foundation/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252949272,"owners_count":21830150,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","computerized-tomography","foundation-model","self-supervised-learning"],"created_at":"2025-05-07T19:58:35.572Z","updated_at":"2026-03-17T18:37:30.981Z","avatar_url":"https://github.com/NYUMedML.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HeadCT-Foundation\n[![Paper](https://img.shields.io/badge/Paper-arxiv.2502.02779-FF6B6B.svg)](https://arxiv.org/abs/2502.02779)\n\u003c/div\u003e\n\n**Abstract:** Head computed tomography (CT) is a widely-used imaging modality for assessing brain, skull, and cerebrovascular pathologies, particularly in neurologic emergencies due to its speed, safety, and accessibility. However, its limited sensitivity compared to MRI and the scarcity of annotated data hinder the development of robust diagnostic models. To address this, we propose a novel head CT foundation model using self-supervised learning on 361,663 non-contrast 3D head CT scans. Our approach leverages self-supervised learning to pre-train a model that learns generalizable features from unlabeled data, followed by fine-tuning on smaller annotated datasets for tasks like hemorrhage and tumor detection. Evaluated on internal and external datasets, the model demonstrates superior performance on downstream tasks and strong generalization across in- and out-of-distribution data. This work establishes a new benchmark for head CT analysis, highlighting the potential of scaling self-supervised learning in 3D medical imaging.\n\n\u003cimg src=\"./images/overview.png\" width=\"900px\"/\u003e\n\n## Installation\n1. Create environment with conda:\n```\nconda create -n head_ct python=3.8.18 -y\nconda activate head_ct\n```\n2. Clone the repository:\n```\ngit clone https://github.com/NYUMedML/HeadCT-Foundation\ncd headCT-Foundation\n```\n3. Install dependencies:\n```\npip install --upgrade pip\npip install -r requirement.txt\npip install -e .\n```\n\n## Starter Notebook\nSee [**./notebooks/extract_feature_sample.ipynb**](./notebooks/extract_feature_sample.ipynb) to get started of how to load data, model weights and extract features from head CT scans. See [**./notebooks/visualization_sample.ipynb**](./notebooks/visualization_sample.ipynb) to find how to load and visualize volumetric head CT scan.\n\n## Train Model\nWe present how to run different downstream training and pre-training methods in root directory of this repository with different specified hyperparameters. For more details on pre-defined hyperparameters and their usage, please check [**./config.py**](./config.py) and called ``yaml`` files (e.g.  [**./configs/downstream/vit_HeadCT_cq500.yaml**](./configs/downstream/vit_HeadCT_cq500.yaml) for ``--cfg ./configs/downstream/vit_HeadCT_cq500.yaml``)\n\nWe additionally present examples of submitting jobs to Slurm Workload Manager in [**./slurm_submit**](./slurm_submit)\n\n### Train Model for Downstream\n\n#### Fine-tuning\n```\ntorchrun --nnodes 1 --nproc_per_node 1 --master_port 12400 ./main_downstream.py --local_rank 0 \\\n    --model_name \"vit\" --batch_size 64 --num_workers 4 --max_epochs 10 --base_lr 1e-5 \\\n    --cfg ./configs/downstream/vit_HeadCT_cq500.yaml \\\n    --optimizer \"AdamW\" --scheduler \"cosine\" --weight_decay 0.01 --grad_clip 1.0 \\\n    --preds_save_name \"cq500_ICH_finetune\" --use_amp \\\n    --classifier \"linear\" --label_name \"ICH\" --dataset \"cq500\" --seed 42 \\\n    --filename \"cq500_ICH\"\n```\n\n#### Linear-probing\nAdd ``--freeze`` to command arguments will only update weights for classification layer:\n```\ntorchrun --nnodes 1 --nproc_per_node 1 --master_port 12400 ./main_downstream.py --local_rank 0 \\\n    --model_name \"vit\" --batch_size 64 --num_workers 4 --max_epochs 10 --base_lr 1e-5 \\\n    --cfg ./configs/downstream/vit_HeadCT_cq500.yaml \\\n    --optimizer \"AdamW\" --scheduler \"cosine\" --weight_decay 0.01 --grad_clip 1.0 \\\n    --preds_save_name \"cq500_ICH_linear_prob\" --use_amp \\\n    --classifier \"linear\" --label_name \"ICH\" --dataset \"cq500\" --seed 42 \\\n    --filename \"cq500_ICH\" --freeze\n```\n\n#### Few-shots\nAdd ``--few_shots \u003cnum_shots\u003e`` to command arguments will perform few shots with ``\u003cnum_shots\u003e`` positive samples for selected disease:\n```\ntorchrun --nnodes 1 --nproc_per_node 1 --master_port 12400 ./main_downstream.py --local_rank 0 \\\n    --model_name \"vit\" --batch_size 64 --num_workers 4 --max_epochs 10 --base_lr 1e-5 \\\n    --cfg ./configs/downstream/vit_HeadCT_cq500.yaml \\\n    --optimizer \"AdamW\" --scheduler \"cosine\" --weight_decay 0.01 --grad_clip 1.0 \\\n    --preds_save_name \"cq500_ICH_fewshots_8\" --use_amp \\\n    --classifier \"linear\" --label_name \"ICH\" --dataset \"cq500\" --seed 42 \\\n    --filename \"cq500_ICH\" --few_shots 8\n```\n\n### Pre-train Model\n\n#### DINO Pre-training\n```\ntorchrun --nnodes 1 --nproc_per_node 1 --master_port 12400 ../main_pretrain_dino.py --local_rank 0 \\\n    --model_name \"dino\" --batch_size 64 --num_workers 4 --max_epochs 1000 --base_lr 1.5e-4 \\\n    --cfg ../configs/dino/dino_HeadCT.yaml \\\n    --use_amp --optimizer \"AdamW\" --scheduler \"cosine\" --weight_decay 5e-3 \\\n    --grad_clip 3.0\n```\n\n#### MAE Pre-training\n```\ntorchrun --nnodes 1 --nproc_per_node 1 --master_port 12400 ../main_pretrain_mae.py --local_rank 0 \\\n    --model_name \"mae\" --batch_size 256 --num_workers 4 --max_epochs 400 --base_lr 1.5e-4 \\\n    --cfg ../configs/mae/mae_HeadCT.yaml \\\n    --use_amp --optimizer \"AdamW\" --scheduler \"cosine\" --weight_decay 5e-3 \\\n    --grad_clip 3.0\n```\n\n## Comparisons and Benchmarks\nWe present comparison among our model, [**Merlin**](https://arxiv.org/abs/2406.06512) (a 3D vision-language foundation model pre-trained on large-scale abdominal CT), [**Google CT Foundation**](https://github.com/Google-Health/imaging-research/tree/master/ct-foundation) (a 3D CT vision foundation model pre-trained on large-scale CT of different anatomy), [**CT-FM**](https://github.com/project-lighter/CT-FM) (another 3D CT vision foundation model pre-trained with different CT anatomy) and model trained from scratch in the plots on multi-label diseases diagnosis and hemorrhage subtypes volume-to-volume retrieval tasks, where we show our model outperforms others across the board. This highlights the success of our training pipeline and importance of our domain specific large-scale pre-training. We additional show comparison on alternative modeling methods on Attention-Based Multiple-Instance-Learning (AB-MIL) and Mean Pooling with 2D foundation model (DINOv3) and directly using 3D Video Foundation Model (VJEPA2), where we show consistent improved performance on our model.\n\n#### Fine-Tuning Comparison \u003cbr\u003e\n\u003cimg src=\"./images/performance1.png\" width=\"900px\"/\u003e\n\n#### Linear Probing Comparison ([**Google CT Foundation**](https://github.com/Google-Health/imaging-research/tree/master/ct-foundation) only allow API access) \u003cbr\u003e\n\u003cimg src=\"./images/performance2.png\" width=\"475px\" alt=\"Performance Image\" style=\"display: block; margin-left: 160px;\"\u003e\n\n#### Volume-to-Volume Retrieval Comparison \u003cbr\u003e\n\u003cdiv style=\"display: flex; justify-content: center; gap: 20px;\"\u003e\n  \u003cimg src=\"./images/mAP_RSNA_Retrieval.png\" width=\"240px\" alt=\"RSNA Retrieval\" /\u003e\n  \u003cimg src=\"./images/mAP_CQ500_Retrieval.png\" width=\"310px\" alt=\"CQ500 Retrieval\" /\u003e\n\u003c/div\u003e\n\n#### Alterntaive Modeling Methods Comparison \u003cbr\u003e\n\u003cimg src=\"./images/performance3.png\" width=\"650px\" alt=\"Alterntiave 1\" style=\"display:inline-block; margin: 0 10px;\" /\u003e\n\u003cimg src=\"./images/performance4.png\" width=\"650px\" alt=\"Alterntiave 2\" style=\"display:inline-block; margin: 0 10px;\" /\u003e\n\n## Attention Map Visualization\nWe present our model attention map visualization here across slices of scan for different diseases, where our model can attend to important region of diagnosing diseases.\n\n\u003cimg src=\"./images/attention_map.png\" width=\"800px\"/\u003e\n\n## Scans Filtering Criterion\nWe present the filtering criterion in combination of Study Description, Kilovoltage Peak (kVp) and Convolution Types for selecting relevant high quality CT scans on building our foundation model in [**./scans_filter_criterion/scans_filter_criterion.csv**](./scans_filter_criterion/scans_filter_criterion.csv).\n\n## Datasets\nDataset splits with labels for CQ500 and RSNA are organized in their respective directories under [**./datasets**](./datasets), with the root directory removed from the image paths to facilitate reproducing our experimental results.\n\n## Model Weights Sharing\nDue to the possibility of inferring private patient facial features from Head CT data, we apologize that public release of the model weights is not permitted. The model weights are only available upon request after signing institutional agreement. Requests for model weights should be sent to the corresponding author and the NYU Langone Data Sharing Strategy Board (DSSB) Committee (DataSharing@nyulangone.org).\n\n## Citation\nIf you find this repository useful and want to see more details, please consider citing and checking our preprint paper:\n```\n@article{zhu2025foundationctmodel, \n    title={3D Foundation AI Model for Generalizable Disease Detection in Head Computed Tomography}, \n    author={Zhu, Weicheng and Huang, Haoxu and Tang, Huanze and Musthyala, Rushabh and Yu, Boyang and Chen, Long and Vega, Emilio and O'Donnell, Thomas and Dehkharghani, Seena and Frontera, Jennifer A. and Masurkar, Arjun V. and Melmed, Kara and Razavian, Narges}, \n    year={2025}, \n    eprint={2502.02779}, \n    archivePrefix={arXiv}, \n    primaryClass={eess.IV}, \n    url={https://arxiv.org/abs/2502.02779},\n    note={Weicheng Zhu and Haoxu Huang contributed equally to this work}\n}\n```\n\u003cimg src=\"./images/logo.png\" width=\"700px\"/\u003e","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnyumedml%2Fheadct_foundation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnyumedml%2Fheadct_foundation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnyumedml%2Fheadct_foundation/lists"}