{"id":28087792,"url":"https://github.com/opengene/lungtcr","last_synced_at":"2025-10-16T13:48:33.453Z","repository":{"id":289067693,"uuid":"970012592","full_name":"OpenGene/LungTCR","owner":"OpenGene","description":"TCR Repertoire Analysis and Lung Cancer Prediction","archived":false,"fork":false,"pushed_at":"2025-08-05T10:30:02.000Z","size":28842,"stargazers_count":4,"open_issues_count":0,"forks_count":2,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-08-05T12:30:41.955Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenGene.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-21T10:08:52.000Z","updated_at":"2025-08-05T10:30:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"79cd56d7-cf75-40f1-b5e0-0ec3c83ae10e","html_url":"https://github.com/OpenGene/LungTCR","commit_stats":null,"previous_names":["opengene/lungtcr"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/OpenGene/LungTCR","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGene%2FLungTCR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGene%2FLungTCR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGene%2FLungTCR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGene%2FLungTCR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenGene","download_url":"https://codeload.github.com/OpenGene/LungTCR/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenGene%2FLungTCR/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279000821,"owners_count":26082950,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-13T11:43:28.562Z","updated_at":"2025-10-09T06:34:35.236Z","avatar_url":"https://github.com/OpenGene.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LungTCR - TCR Repertoire Analysis and Lung Cancer Prediction\n\n![GitHub](https://img.shields.io/badge/license-MIT-blue)\n![R](https://img.shields.io/badge/R-%3E%3D4.0-blue)\n![Python](https://img.shields.io/badge/Python-%3E%3D3.8-green)\n\n## Overview\n\nLungTCR (https://www.lungtcr.com/) is a comprehensive website for analyzing T-cell receptor (TCR) repertoire data with specialized functions for cancer risk assessment. This repository contains:\n\n- TCR repertoire feature calculation pipeline\n- Cancer-associated TCR enrichment scoring \n- Machine learning models for lung cancer/malignant pulmonary nodule risk prediction\n\n## Key Features\n\n| Feature | Description |\n|---------|-------------|\n| TCR Repertoire Analysis | Calculates 20+ diversity and clonality metrics |\n| Cancer TCR Enrichment | Quantifies tumor-associated TCR signatures |\n| Risk Prediction Models | Random Forest/GBM models for lung cancer risk assessment |\n| Visualization | Plotting of key TCR features and model result |\n\n\n## Input File Format\n\nThe input file format is VDJtools' table format. \nRun Convert routine by VDJtools (https://vdjtools-doc.readthedocs.io/en/master/input.html#vdjtools-format) to geneate the format.\n\n### Clonotype Table Requirements\n\n| Column | Required | Description | Example |\n|--------|----------|-------------|---------|\n| count | Yes | Read counts of TCR clones | 161853 |\n| freq | Yes | Frequency of TCR clones | 0.009385 |\n| cdr3_nt | Yes | CDR3 nucleic acid sequence  | TGTGCCAGTTCGTCGTCTAGCTCCTACAATGAGCAGTTCTTC |\n| cdr3_aa | Yes | CDR3 amino acid sequence | CASSSSSSYNEQFF |\n| v | Yes | V gene segment | TRBV6-4 |\n| d | Yes | D gene segment | . |\n| j | Yes | J gene segment | TRBJ2-7 |\n| VEnd | No | Position of the V gene end | 7 |\n| Dstart | No | Position of the d gene start | . |\n| Dend | No | Position of the D gene end | . |\n| Jstart | No | Position of the J gene strat | 18 |\n| sample_id | No | Sample identifier | Patient01_PBMC |\n\n\nExample file (tabular format):\n```\ncount\tfreq\tcdr3nt\tcdr3aa\tv\td\tj\tVEnd\tDStart\tDEnd\tJStart\n161853\t0.009385105218213133\tTGTGCCAGTTCGTCGTCTAGCTCCTACAATGAGCAGTTCTTC\tCASSSSSSYNEQFF\tTRBV6-4\t.\tTRBJ2-1\t7\t-1\t-1\t18\n128851\t0.007471472215355789\tTGTGCCAGCTCACCATAGGACAGTGCTTCTCTGGAAACACCATATATTTT\tCASSP*DS_FSGNTIYF\tTRBV18\tTRBD1\tTRBJ1-3\t14\t17\t22\t28\n107730\t0.006246763329429179\tTGTGCCAGCAGTTACGGTCTAAGAGATACGCAGTATTTT\tCASSYGLRDTQYF\tTRBV6-5\t.\tTRBJ2-3\t14\t-1\t-1\t23\n...\n```\n\n## Basic Usage\n\n```bash\npython TCRfeatureCal.py -m /extdata/metadata.tsv -o output_features/\npython diversity_vdjtools_wrapper.py -m /extdata/metadata.txt -o outout_diversity/ -x 10000000\nRscript LungTCR_ModelPrediction.R -i /extdata/test.csv -o output_model/ -m /model/models_list.rds -f /model/SelectedFeatures.csv \n```\n\nOutput includes:\n- Diversity indices (Shannon, Simpson, etc.)\n- Clonality metrics\n- V/J gene usage profiles\n- CDR3 length distributions\n- CDR3 amino acid compositions\n- TCR clones frequency distributions\n- TCR convergence index\n- Lung cancer enrichment score\n- Lung cancer prediction result\n\n\n\n\n\n## Citation\n\nThe code files here are linked to the work \"Large-Scale TCR Repertoire Profiling Unveils Tumor-Specific Signals for Diagnosing Indeterminate Pulmonary Nodules\" by Chen et al .\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopengene%2Flungtcr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopengene%2Flungtcr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopengene%2Flungtcr/lists"}