{"id":23801870,"url":"https://github.com/bowang-lab/ecg-fm","last_synced_at":"2025-03-29T19:02:02.848Z","repository":{"id":252289881,"uuid":"806734460","full_name":"bowang-lab/ECG-FM","owner":"bowang-lab","description":"An electrocardiogram analysis foundation model.","archived":false,"fork":false,"pushed_at":"2025-01-07T18:08:44.000Z","size":10120,"stargazers_count":136,"open_issues_count":1,"forks_count":8,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-22T18:09:59.149Z","etag":null,"topics":["electrocardiogram","foundation-models","healthcare","machine-learning","mimic-iv-ecg","physionet2021","transformer"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bowang-lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-27T19:39:00.000Z","updated_at":"2025-03-18T17:42:36.000Z","dependencies_parsed_at":"2024-08-08T21:25:26.222Z","dependency_job_id":"0d2db67a-0314-4a5a-92c9-16779b83ee03","html_url":"https://github.com/bowang-lab/ECG-FM","commit_stats":{"total_commits":12,"total_committers":1,"mean_commits":12.0,"dds":0.0,"last_synced_commit":"3f764c354297ce61b8123c800d664478e390fba9"},"previous_names":["bowang-lab/ecg-fm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bowang-lab%2FECG-FM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bowang-lab%2FECG-FM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bowang-lab%2FECG-FM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bowang-lab%2FECG-FM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bowang-lab","download_url":"https://codeload.github.com/bowang-lab/ECG-FM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246230523,"owners_count":20744347,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["electrocardiogram","foundation-models","healthcare","machine-learning","mimic-iv-ecg","physionet2021","transformer"],"created_at":"2025-01-01T22:15:36.822Z","updated_at":"2025-03-29T19:02:02.825Z","avatar_url":"https://github.com/bowang-lab.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"docs/ecg_fm_logo.png\" width=\"200\"\u003e\n  \u003cbr /\u003e\n  \u003cbr /\u003e\n  \u003ca href=\"https://github.com/bowang-lab/ECG-FM/blob/main/LICENSE/\"\u003e\u003cimg alt=\"MIT License\" src=\"https://img.shields.io/badge/license-MIT-blue.svg\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://arxiv.org/abs/2408.05178\"\u003e\u003cimg alt=\"arxiv\" src=\"https://img.shields.io/badge/cs.LG-2408.05178-b31b1b?logo=arxiv\u0026logoColor=red\"/\u003e\u003c/a\u003e\n  \u003c!-- https://academia.stackexchange.com/questions/27341/flair-badge-for-arxiv-paper --\u003e\n  \u003c!-- https://img.shields.io/badge/\u003cSUBJECT\u003e-\u003cIDENTIFIER\u003e-\u003cCOLOR\u003e?logo=\u003cSIMPLEICONS NAME\u003e\u0026logoColor=\u003cLOGO COLOR\u003e --\u003e\n\n\u003c/div\u003e\n\n--------------------------------------------------------------------------------\n\nECG-FM is a foundation model for electrocardiogram (ECG) analysis. Committed to open-source practices, ECG-FM was developed in collaboration with the [fairseq_signals](https://github.com/Jwoo5/fairseq-signals) framework, which implements a collection of deep learning methods for ECG analysis. This repository serves as a landing page and will host project-specific scripts as this work progresses.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"docs/saliency.png\" width=\"500\"\u003e\n\u003c/div\u003e\n\n## News\n- 2024-08-12: ECG-FM arxiv \u0026 GitHub released\n\n## Model Details\n\nECG-FM adopts the wav2vec 2.0 architecture and was pretrained using the W2V+CMSC+RLM (WCR) method. It has 311,940,352 parameters and was trained using 4 NVIDIA A100 80GB GPUs over 16.5 days. For our transformer encoder, we selected hyperparameters consistent with a BERT-Large encoder. Further details are available in our [paper](https://arxiv.org/abs/2408.05178).\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"docs/architecture.png\" width=\"750\"\u003e\n\u003c/div\u003e\n\n### Model Parameters\nWe are committed to open-weight practices. Model checkpoints have been made publicly available for [download on HuggingFace](https://huggingface.co/wanglab/ecg-fm-preprint).\n\nSpecifically, there is:\n\n`mimic_iv_ecg_physionet_pretrained.pt`\n- Was pretrained on [MIMIC-IV-ECG v1.0](https://physionet.org/content/mimic-iv-ecg/1.0/) and [PhysioNet 2021 v1.0.3](https://physionet.org/content/challenge-2021/1.0.3/).\n\n`physionet_finetuned.pt`\n- Was finetuned from `mimic_iv_ecg_physionet_pretrained.pt` on [PhysioNet 2021 v1.0.3](https://physionet.org/content/challenge-2021/1.0.3/).\n\n\n**Disclaimer: These models are different from those reported in our arXiv paper.** These BERT-Base sized models were trained purely on public data sources due to privacy concerns surrounding UHN-ECG data and patient identification. Validation for the final models will be available upon full publication.\n\n## Getting Started\n\n### Installation\nClone [fairseq_signals](https://github.com/Jwoo5/fairseq-signals) and refer to the requirements and installation section in the top-level README.\n\n### Data Preparation\nWe implemented a flexible, end-to-end, multi-source data preprocessing pipeline. Please refer to it [here](https://github.com/Jwoo5/fairseq-signals/tree/master/scripts/preprocess/ecg).\n\n### Inference \u0026 Model Loading\nSee our [inference tutorial notebook](inference_tutorial.ipynb)!\n\n### Training\nTraining is performed through the [fairseq_signals](https://github.com/Jwoo5/fairseq-signals) framework. To maximize reproducibility, we have provided [configuration files](https://huggingface.co/wanglab/ecg-fm-preprint).\n\nPretraining can be performed by downloading the `mimic_iv_ecg_physionet_pretrained.yaml` config (or modifying `fairseq-signals/examples/w2v_cmsc/config/pretraining/w2v_cmsc_rlm.yaml` as desired).\nAfter modifying the relevant configuration file as desired, pretraining is performed using hydra's command line interface. This command highlights some popular config overrides:\n```\nFAIRSEQ_SIGNALS_ROOT=\"\u003cTODO\u003e\"\nMANIFEST_DIR=\"\u003cTODO\u003e/cmsc\"\nOUTPUT_DIR=\"\u003cTODO\u003e\"\n\nfairseq-hydra-train \\\n    task.data=$MANIFEST_DIR \\\n    dataset.valid_subset=valid \\\n    dataset.batch_size=64 \\\n    dataset.num_workers=10 \\\n    dataset.disable_validation=false \\\n    distributed_training.distributed_world_size=4 \\\n    optimization.update_freq=[2] \\\n    checkpoint.save_dir=$OUTPUT_DIR \\\n    checkpoint.save_interval=10 \\\n    checkpoint.keep_last_epochs=0 \\\n    common.log_format=csv \\\n    --config-dir $FAIRSEQ_SIGNALS_ROOT/examples/w2v_cmsc/config/pretraining \\\n    --config-name w2v_cmsc_rlm\n```\n\nClassification finetuning uses the `physionet_finetuned.yaml` or `fairseq-signals/examples/w2v_cmsc/config/finetuning/ecg_transformer/diagnosis.yaml` configs. This command highlights some popular config overrides:\n```\nFAIRSEQ_SIGNALS_ROOT=\"\u003cTODO\u003e\"\nPRETRAINED_MODEL=\"\u003cTODO\u003e\"\nMANIFEST_DIR=\"\u003cTODO\u003e\"\nLABEL_DIR=\"\u003cTODO\u003e\"\nOUTPUT_DIR=\"\u003cTODO\u003e\"\nNUM_LABELS=$(($(wc -l \u003c \"$LABEL_DIR/label_def.csv\") - 1))\nPOS_WEIGHT=$(cat $LABEL_DIR/pos_weight.txt)\n\nfairseq-hydra-train \\\n    task.data=$MANIFEST_DIR \\\n    model.model_path=$PRETRAINED_MODEL \\\n    model.num_labels=$NUM_LABELS \\\n    optimization.lr=[1e-06] \\\n    optimization.max_epoch=140 \\\n    dataset.batch_size=256 \\\n    dataset.num_workers=5 \\\n    dataset.disable_validation=true \\\n    distributed_training.distributed_world_size=1 \\\n    distributed_training.find_unused_parameters=True \\\n    checkpoint.save_dir=$OUTPUT_DIR \\\n    checkpoint.save_interval=1 \\\n    checkpoint.keep_last_epochs=0 \\\n    common.log_format=csv \\\n    +task.label_file=$LABEL_DIR/y.npy \\\n    +criterion.pos_weight=$POS_WEIGHT \\\n    --config-dir $FAIRSEQ_SIGNALS_ROOT/examples/w2v_cmsc/config/finetuning/ecg_transformer \\\n    --config-name diagnosis\n  ```\n\n*Notes:*\n- With CMSC pretraining, the batch size refers to pairs of adjacent segments. Therefore, the effective pretraining batch size is `64 pairs * 2 segments per pair * 4 GPUs * 2 gradient accumulations (update_freq) = 1024 segments`.\n- ECG-FM has 311,940,352 parameters, whereas the base model has 90,883,072 parameters. We would not suggest pretraining a large model having only those public data sources (PhysioNet 2021 and MIMIC-IV-ECG) used in the paper.\n\n### Labeling Functionality\nFunctionality for our comphensive free-text pattern matching and knowledge graph based label manipulation will be made available soon!\n\n## Questions\nInquiries may be directed to kaden.mckeen@mail.utoronto.ca.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbowang-lab%2Fecg-fm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbowang-lab%2Fecg-fm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbowang-lab%2Fecg-fm/lists"}