{"id":37702486,"url":"https://github.com/masilab/lcancer_baselines","last_synced_at":"2026-01-16T13:01:37.933Z","repository":{"id":262727034,"uuid":"885548166","full_name":"MASILab/lcancer_baselines","owner":"MASILab","description":null,"archived":false,"fork":false,"pushed_at":"2024-11-23T01:19:16.000Z","size":71,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-11-23T02:19:15.208Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MASILab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-08T20:00:26.000Z","updated_at":"2024-11-23T01:19:19.000Z","dependencies_parsed_at":"2024-11-14T00:24:59.384Z","dependency_job_id":"e57ec18e-3acf-422a-9fec-ff4126267bd0","html_url":"https://github.com/MASILab/lcancer_baselines","commit_stats":null,"previous_names":["masilab/lcancer_baselines"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/MASILab/lcancer_baselines","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MASILab%2Flcancer_baselines","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MASILab%2Flcancer_baselines/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MASILab%2Flcancer_baselines/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MASILab%2Flcancer_baselines/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MASILab","download_url":"https://codeload.github.com/MASILab/lcancer_baselines/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MASILab%2Flcancer_baselines/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28478888,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T11:59:17.896Z","status":"ssl_error","status_checked_at":"2026-01-16T11:55:55.838Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-16T13:01:37.459Z","updated_at":"2026-01-16T13:01:37.928Z","avatar_url":"https://github.com/MASILab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Benchmarking lung cancer models and datasets\n\nT.Z. Li, K. Xu, A. Krishnan, R. Gao, M.N. Kammer, S. Antic, D. Xiao, M. Knight, Y. Martinez, R. Paez, R.J. Lentz, S. Deppen, E.L. Grogan, T.A. Lasko, K.L. Sandler, F. Maldonado, B.A. Landman, No winners: Performance of lung cancer prediction models depends on screening-detected, incidental, and biopsied pulmonary nodule use cases, (2024). https://arxiv.org/abs/2405.10993v1\n\nWe ran 8 predictive models for lung cancer diagnosis across 9 different cohorts to evaluate their performance in different clinical settings. This repo supports training and inference of these models for a public lung screening dataset [NLST](https://cdas.cancer.gov/nlst/), but other datasets from this study are private. \n\n# Usage\n## Install\n1. `pip install -r requirements.txt`\n2. Clone https://github.com/MASILab/DeepLungScreening\n3. Edit `definitions.py` to point to working directories\n\n## Datasets\nThsi repo can be run with any lung CT dataset with the following setup. We will use the [NLST](https://cdas.cancer.gov/nlst/) in this example. Make the corresponding name and path replacements in `cachedcohorts.py` like so:\n```\n# cachedcohorts.py\nNLST_CACHE = CachedCohort(\n    name=NAMES.nlst,\n    cohort=os.path.join(D.DATASET_DIR, 'nlst/nlst.csv'),\n    scan_cohort=os.path.join(D.DATASET_DIR, 'nlst/nlst_scan.csv'),\n    noduleft_data=os.path.join(D.DATASET_DIR, 'nlst/liao/feat128/'),\n    img_data=os.path.join(D.DATASET_DIR, 'nlst/DeepLungScreening/nifti'),\n    imgprep_data=os.path.join(D.DATASET_DIR, 'nlst/DeepLungScreening/prep'),\n    imgbbox_data=os.path.join(D.DATASET_DIR, 'nlst/DeepLungScreening/bbox'),\n    imgprep_list=os.path.join(D.DATASET_DIR, 'nlst/nlst_prep.csv'),\n    dlsft64_data=os.path.join(D.DATASET_DIR, 'nlst/DeepLungScreening/feat64'),\n    dlsft128_data=os.path.join(D.DATASET_DIR, 'nlst/DeepLungScreening/feat128'),\n)\n```\n### nlst.csv\n`NLST_CACHE.cohort` should point to a csv with the format\n| pid        | lung_cancer | nodule_count |\n| ----------- | ----- | ----- |\n| unique patient ID | 0 or 1 label | int (optional) |\n### nlst_scan.csv\n`NLST_CACHE.scan_cohort` should point to a csv with the format\n| pid        | scandate          | scanorder | fpath | lung_cancer | nodule_count |\n| ----------- | ------------ | ----- | ----- | --- | ----- |\n| unique patient ID | %Y%m%d | int with 0 being earliest scan | path to CT scan with suffix `.nii.gz` | 0 or 1 label | int (optional) |\n### test_set.csv (optional)\n`NLST_CACHE.test` should point to a csv with the format\nHere we use the test set given [Ardila et al.](https://www.nature.com/articles/s41591-019-0447-x) test set \n\n### nifti/\n`NLST_CACHE.img_data` should point to a directory of CT scans in NIfTI format (`.nii.gz`)\n\n### Liao and DeepLungScreening pipelines\nSome models rely on the features from the Liao et al. model. The following pipeline will compute intermediate data and features in the locations specified in `imgprep_data`, `imgbbox_data`, `dlsft64_data`, and `dlsft128_data`. \n\n1. Preprocessing CT scans and generating list of scans that passed this step in `imgprep_list`:\n```\n#!/bin/bash\npython imgprep.py 1 nlst.test_scan\npython imgprep.py prep --prep_dst nlst_prep.csv\n\n```\n2. Computing bounding boxes for using a pretrained nodule detection model from Liao et al.\n```\npython imgprep.py 2 nlst.test_scan\n```\n1. Computing feature vectors using a pretrained ResNet from Liao et al.\n```\npython imgprep.py 3 nlst.test_scan\n```\n1. Make predictions with a multimodal model from DeepLungScreening.\n```\npython imgprep.py 4 nlst.test_scan --predictions dls.csv\n```\n\n## Model Training and Inference\n```\n#!/bin/bash\npython cli.py train nlst.train_cohort\npython cli.py test nlst.test_scan\n```\nReplace `nlst.train_cohort` with `nlst.ft_train` and `nlst.test_scan` with `nlst.ft_test_scan` if you are running a model that uses the Liao or DLS pipelines. This change leaves out the subjects that were not able to be processed by the Liao pipeline.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmasilab%2Flcancer_baselines","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmasilab%2Flcancer_baselines","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmasilab%2Flcancer_baselines/lists"}