{"id":44574984,"url":"https://github.com/zhengxinchang/isopedia","last_synced_at":"2026-04-23T17:01:17.454Z","repository":{"id":338303517,"uuid":"841143525","full_name":"zhengxinchang/isopedia","owner":"zhengxinchang","description":"Simultaneous exploration of thousands of long-read transcriptomes by read-level indexing","archived":false,"fork":false,"pushed_at":"2026-03-30T01:51:35.000Z","size":125532,"stargazers_count":18,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-03-30T05:12:04.047Z","etag":null,"topics":["alternative-splice","fusion-gene","isoforms","long-read-sequencing","transcriptome"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zhengxinchang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-08-11T19:07:14.000Z","updated_at":"2026-03-30T01:51:38.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/zhengxinchang/isopedia","commit_stats":null,"previous_names":["zhengxinchang/isopedia"],"tags_count":22,"template":false,"template_full_name":null,"purl":"pkg:github/zhengxinchang/isopedia","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhengxinchang%2Fisopedia","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhengxinchang%2Fisopedia/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhengxinchang%2Fisopedia/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhengxinchang%2Fisopedia/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zhengxinchang","download_url":"https://codeload.github.com/zhengxinchang/isopedia/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhengxinchang%2Fisopedia/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32189660,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-23T15:28:30.493Z","status":"ssl_error","status_checked_at":"2026-04-23T15:28:29.972Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alternative-splice","fusion-gene","isoforms","long-read-sequencing","transcriptome"],"created_at":"2026-02-14T04:21:08.076Z","updated_at":"2026-04-23T17:01:17.447Z","avatar_url":"https://github.com/zhengxinchang.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# About Isopedia \u003cimg src=\"./img/logo3.png\" align=\"right\" alt=\"\" width=120 /\u003e\n\n**Isopedia** is a scalable tool for analyzing hundreds to thousands of long-read transcriptome datasets using a read-level indexing approach.\n\nIt provides two key capabilities:\n\n- Population-level transcript quantification and frequency profiling.\n- Isoform diversity exploration, including fusion and splice-junction analysis.\n\npreprint: [https://www.biorxiv.org/content/10.64898/2026.03.23.713667v1.full](https://www.biorxiv.org/content/10.64898/2026.03.23.713667v1.full)\n\n\n## Table of Contents\n\n- [Quick Start](#quick-start)\n- [How It Works](#how-it-works)\n- [Download Pre-built Index](#download-pre-built-index)\n- [Build Your Own Index](#build-your-own-index)\n- [How to Use Isopedia](#how-to-use-isopedia)\n  - [Isoform Query](#isoform-query)\n  - [Fusion Query and Discovery](#fusion-query-and-discovery)\n  - [Splice Query and Visualization](#splice-query-and-visualization)\n- [Command Parameters](#command-parameters-latest)\n- [How to Install Isopedia](#how-to-install-isopedia)\n\u003c!-- - [Annotate ORF with ORFannotate](#annotate-orf-with-orfannotate) --\u003e\n- [Computational Resource Usage](#computational-resource-usage)\n- [Contact](#contact)\n- [FAQ](#faq)\n\n## Quick Start\n\nIsopedia has two binaries:\n\n- `isopedia`: main workflows (query + index building)\n- `isopedia-tools`: helper utilities\n\n```bash\n# download the latest release from GitHub\ncurl -L https://github.com/zhengxinchang/isopedia/releases/download/v1.6.6/isopedia-v1.6.6-linux-x86_64.tar.gz | tar -xzvf -\n./isopedia-v1.6.6-linux-x86_64/isopedia\n\n# clone the repo and enter toy example directory\ngit clone https://github.com/zhengxinchang/isopedia \u0026\u0026 cd isopedia/toy_ex/\n\n# query transcripts\nisopedia isoform -i index/ -g gencode.v47.basic.chr22.gtf -o out.profile.tsv.gz # 3364 records returned\n\n# query one splice junction\nisopedia splice -i index/ -s 22:17744013,22:17750104 -o out.splice.tsv.gz # 13 records returned\n\n# visualize splice query output\nisopedia-splice-viz.py -i out.splice.tsv.gz -g gencode.v47.basic.chr22.gtf -o isopedia-splice-view\n\n# query one fusion event (two breakpoints)\nisopedia fusion -i index/ -p chr1:181130,chr1:201853853 -o ./out.fusion.tsv.gz # 0 recored returned, as hg002 is a healthy individual\n\n# query multiple fusion events\nisopedia fusion -i index/ -P fusion_query.tsv -o ./out.fusion.tsv.gz # 0 recored returned, as hg002 is a healthy individual\n\n```\n\n\nOverview of Isopedia's query framework and supported output types. All query modes run on a standard laptop.\n\n![What you can get from isopedia](./img/workflow.png)\n\n\n\u003c!-- For GTF indexing details, see [Indexing GTF Files](doc/indexing_gtf.md). --\u003e\n\n## How It Works\n\n![how-it-works](./img/how-it-works.png)\n\nTypical workflow:\n\n1. `isopedia profile`: extract isoform signals per sample.\n2. `isopedia merge`: aggregate all profiles into one merged archive.\n3. `isopedia index`: build tree index for fast query.\n4. Query with `isopedia isoform`, `isopedia fusion`, or `isopedia splice`.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eHow Isopedia determines a positive hit for a query transcript\u003c/strong\u003e\u003c/summary\u003e\n\n\u003cimg src=\"./img/how-it-works2.png\" align=\"center\" alt=\"\" /\u003e\n\n\u003c/details\u003e\n\n## Download Pre-built Index\n\nIsopedia provides pre-built indexes from public long-read RNA-seq datasets.\n\n| Name | Organism | Reference | Link | Sample Size | Index Size (Compressed) | Minimum Memory for Query | Description |\n|---|---|---|---|---:|---|---|---|\n| `isopedia_index_hs_v1.0` | *Homo sapiens* | GRCh38 | `ftp://hgsc-sftp1.hgsc.bcm.tmc.edu/rt38520/isopedia_index_hs_v1.0.tar.xz` | 1,007 | 305G (107G) | 16 GB | 107 ENCODE samples + 900 SRA samples |\n\nChecksum file:\n\n- `ftp://hgsc-sftp1.hgsc.bcm.tmc.edu//rt38520/isopedia_index_hs_v1.0.tar.xz.md5sum`\n\u003c!-- \n### Download with `isopedia download`\n\n```bash\n# list indexes from the default remote manifest\nisopedia download --list\n\n# download a named index from the default manifest\nisopedia download --name isopedia_index_hs_v1.0.tar.xz --outdir ./downloads\n\n# download from custom URL\nisopedia download --url ftp://hgsc-sftp1.hgsc.bcm.tmc.edu//rt38520/isopedia_index_hs_v1.0.tar.xz --outdir ./downloads\n\n# use custom manifest TOML\nisopedia download --manifest /path/to/manifest.toml --name isopedia_index_hs_v1.0.tar.xz --outdir ./downloads\n\n```\n\nExample custom manifest entry:\n\n```toml\n[[index]]\nname = \"isopedia_index_hs_v1.0.tar.xz\"\nurl = \"ftp://hgsc-sftp1.hgsc.bcm.tmc.edu//rt38520/isopedia_index_hs_v1.0.tar.xz\"\nsize = 114596438692\nmd5 = \"eb922ef27257d363969a835d4175da26\"\nsource = \"ftp\"\nprotocol = \"ftp\"\ndescription = \"Human pre-built index (GRCh38)\"\n``` --\u003e\n\nUnpack:\n\n```bash\ntar -xvf isopedia_index_hs_v1.0.tar.xz\n```\n\n## Build Your Own Index\n\nPrerequisites:\n\n\u003c!-- 1. Long-read alignments (`.bam`/`.cram`) or transcript annotations (`.gtf`). --\u003e\n1. Long-read alignments (`.bam`/`.cram`)\n2. A manifest TSV with at least two columns: sample name and profile path.\n\nExample manifest:\n\n```tsv\nname\tpath\tplatform\nHG002_pb_chr22\t/path/to/hg002_pb_chr22.isoform.gz\tPacBio\nHG002_ont_chr22\t/path/to/hg002_ont_chr22.isoform.gz\tONT\n```\n\nWorkflow:\n\n```bash\n# 1) profile each sample\nisopedia profile -i ./chr22.pb.grch38.bam -o ./hg002_pb_chr22.isoform.gz\nisopedia profile -i ./chr22.ont.grch38.bam -o ./hg002_ont_chr22.isoform.gz\n\n# 2) merge profiles\nulimit -n 65535 # increase the maximum number of open file descriptors, in case of merging many samples.\nisopedia merge -i manifest.tsv -o index/\n\n# 3) build tree index\nisopedia index -i index/ -m manifest.tsv\n\n# 4) test query\nisopedia isoform -i index/ -g gencode.v47.basic.chr22.gtf -o out.isoform.tsv.gz\n```\n\n## How to Use Isopedia\n\n### Isoform Query\n\nPurpose:\n\n- Annotate transcripts from an input GTF against the index.\n\nExample:\n\n```bash\n# input GTF must be sorted\ngffread -T -o- origin.gtf | sort -k1,1 -k4,4n | gffread - -o query.sorted.gtf\n\nisopedia isoform -i index/ -g query.sorted.gtf -o isoform.out.tsv.gz\n```\n\n#### Output columns (latest code)\n\n| Column | Description |\n|---|---|\n| `chrom` | Chromosome |\n| `start` | Transcript start |\n| `end` | Transcript end |\n| `length` | Transcript length |\n| `exon_count` | Exon count |\n| `trans_id` | Transcript ID |\n| `gene_id` | Gene ID |\n| `ranking_score` | Ranking score across cohort |\n| `detected(total:fsm:em)` | Detection status for total/FSM/EM; The detected status column follows the format \"Overall detected:FSM detected:EM detected\". FSM detected indicates whether the sample has evidence from a full-splice match, and EM detected indicates whether the sample has evidence from EM estimation. Overall detected is set to \"yes\" if either FSM detected or EM detected is \"yes\". Entries with \"no:no:no\" can be excluded to quickly filter for transcripts with detection support.|\n| `min_read` | Minimum read threshold used |\n| `n_pos_samples(total:fsm:em/sample_size)` | Positive sample counts and sample size |\n| `attributes` | Original GTF attributes |\n| `FORMAT` | Per-sample field format. Current `FORMAT` string: `CPM:COUNT:FSM_CPM:FSM_COUNT:EM_CPM:EM_COUNT:INFO`. Per-sample values include abundance/count components for total, FSM, and EM estimates.|\n| `sample_*` | Per-sample values |\n\n\n\n**Please note that CPM values are calculated by normalizing across all input transcript queries, as Isopedia expects the input GTF to represent a complete transcriptome. If you query only a subset of transcripts, the CPM values will not be meaningful.**\n\n\n\n### Fusion Query and Discovery\n\n`fusion` supports three modes:\n\n1. Breakpoint query (`--pos` or `--pos-bed`)\n2. Gene-region discovery (`--gene-gtf`)\n3. Region-pair detailed read output (`--region`)\n\nExamples:\n\n```bash\n# single breakpoint pair\nisopedia fusion -i index/ -p chr1:1000,chr2:2000 -o fusion.single.tsv.gz\n\n# batch breakpoint pairs from bed-like file\nisopedia fusion -i index/ -P fusion_breakpoints.bed -o fusion.batch.tsv.gz\n\n# discover candidate fusions from gene GTF\nisopedia fusion -i index/ -G gene.gtf -o fusion.discovery.tsv.gz\n\n# detailed read-level output for two regions\nisopedia fusion -i index/ -r chr8:92017266-92017466,chr21:34859374-34859574 -o fusion.region.tsv.gz\n```\n\nBreakpoint-query output columns:\n\n- `chr1`, `pos1`, `chr2`, `pos2`, `id`, `min_read`, `positive/sample_size`, `left_isoforms`, `right_isoforms`, then per-sample counts.\n\nGene-discovery output columns:\n\n- `geneA_name`, `geneB_name`, `total_evidences`, `total_samples`, `is_two_strand`,\n  `AtoB_primary_start`, `AtoB_primary_end`, `AtoB_supp_start`, `AtoB_supp_end`,\n  `BtoA_primary_start`, `BtoA_primary_end`, `BtoA_supp_start`, `BtoA_supp_end`, then per-sample counts.\n\nRegion-pair detailed output columns:\n\n- `chr1`, `start1`, `end1`, `chr2`, `start2`, `end2`, `main_exon_count1`,\n  `supp_segment_count2`, `query_part`, `main_isoforms`, `supp_aln_regions`, `sample_name`.\n\n### Splice Query and Visualization\n\nPurpose:\n\n- Query isoforms overlapping a splice junction and optionally visualize them.\n\nExamples:\n\n```bash\n# single splice query\nisopedia splice -i index/ -s chr22:41100500,chr22:41101500 -o splice.out.tsv.gz\n\n# batch splice query\nisopedia splice -i index/ -S splice_queries.bed -o splice.batch.tsv.gz\n\n# visualize\npython script/isopedia-splice-viz.py \\\n  -i splice.out.tsv.gz \\\n  -g gencode.v47.basic.annotation.gtf \\\n  -t script/isopedia-splice-viz-temp.html \\\n  -o isopedia-splice-view\n```\n\n`splice` output columns:\n\n- `id`, `chr1`, `pos1`, `chr2`, `pos2`, `total_evidence`, `cpm`, `matched_sj_idx`,\n  `dist_to_matched_sj`, `n_exons`, `start_pos_left`, `start_pos_right`,\n  `end_pos_left`, `end_pos_right`, `splice_junctions`, then per-sample values.\n\n`splice` per-sample `FORMAT`:\n\n- `COUNT:CPM:START|END|STRAND`\n\n## Command Parameters\n\nParameter lists below are based on the current CLI in source.\n\n### `isopedia isoform`\n\nCore options:\n\n- `-i, --idxdir \u003cIDXDIR\u003e`\n- `-g, --gtf \u003cGTF\u003e`\n- `-o, --output \u003cOUTPUT\u003e`\n- `-f, --flank \u003cFLANK\u003e` (default: `10`)\n- `-m, --min-read \u003cMIN_READ\u003e` (default: `1`)\n- `--info`\n- `-n, --num-threads \u003cNUM_THREADS\u003e` (default: `4`)\n- EM options: `--em-max-iter`, `--em-conv-min-diff`, `--em-chunk-size`, `--em-effective-len-coef`, `--em-damping-factor`, `--min-em-abundance`\n- TSS/TES options: `--no-check-tss-tes`, `--tss-degrad-bp`, `--tes-degrad-bp`, `--terminal-tolerance-bp`\n- Cache options: `-c, --cached-nodes`, `--cached-chunk-num`, `--cached-chunk-size-mb`\n- `--output-tmp-shard-counts`\n- `--verbose`\n\n### `isopedia fusion`\n\nCore options:\n\n- `-i, --idxdir \u003cIDXDIR\u003e`\n- Query selectors: `-p, --pos`, `-P, --pos-bed`, `-r, --region`, `-G, --gene-gtf`\n- `-o, --output \u003cOUTPUT\u003e`\n- `-f, --flank \u003cFLANK\u003e` (default: `10`)\n- `-m, --min-read \u003cMIN_READ\u003e` (default: `1`)\n- Cache options: `-c, --cached_nodes`, `--cached-chunk-number`, `--cached-chunk-size-mb`\n- `--verbose`\n\n### `isopedia splice`\n\nCore options:\n\n- `-i, --idxdir \u003cIDXDIR\u003e`\n- Query selectors: `-s, --splice` or `-S, --splice-bed`\n- `-o, --output \u003cOUTPUT\u003e`\n- `-f, --flank \u003cFLANK\u003e` (default: `10`)\n- `-m, --min-read \u003cMIN_READ\u003e` (default: `1`)\n- Cache options: `-c, --cached_nodes`, `--cached-chunk-number`, `--cached-chunk-size-mb`\n- `--verbose`\n\n### `isopedia profile`\n\nCore options:\n\n- Input selectors: `-i, --bam` or `-g, --gtf`\n- `-r, --reference \u003cREFERENCE\u003e` for CRAM\n- `-o, --output \u003cOUTPUT\u003e`\n- `--mapq \u003cMAPQ\u003e` (default: `5`)\n- `--use-secondary`, `--rname`, `--tid`, `--gid`, `--verbose`\n\n### `isopedia merge`\n\nCore options:\n\n- `-i, --input \u003cINPUT\u003e`\n- `-o, --outdir \u003cOUTDIR\u003e`\n- `-c, --chunk-size \u003cCHUNK_SIZE\u003e` (default: `1000000`)\n\n### `isopedia index`\n\nCore options:\n\n- `-i, --idxdir \u003cIDXDIR\u003e`\n- `-m, --manifest \u003cMANIFEST\u003e`\n- `-t, --threads \u003cTHREADS\u003e` (default: `4`)\n\n### `isopedia download` (beta)\n\nCore options:\n\n- `-l, --list`\n- `-n, --name \u003cNAME\u003e`\n- `-u, --url \u003cURL\u003e`\n- `-m, --manifest \u003cMANIFEST\u003e`\n- `-o, --outdir \u003cOUTDIR\u003e` (default: `.`)\n\n## How to Install Isopedia\n\nWe offer two ways to install isopedia:\n\n### 1. Download latest release (recommended)\n\nDownload the pre-built binary from the [GitHub Releases page](https://github.com/zhengxinchang/isopedia/releases/latest):\n\n```bash\ncurl -L https://github.com/zhengxinchang/isopedia/releases/download/v1.6.6/isopedia-v1.6.6-linux-x86_64.tar.gz | tar -xzvf -\n./isopedia-v1.6.6-linux-x86_64/isopedia\n```\n\n### 2. Build from source (for users want to build from source)\n\nRust and Cargo are required.\n\n```bash\ngit clone https://github.com/zhengxinchang/isopedia.git\ncd isopedia\ncargo build --release\n```\n\n\u003c!-- ## Annotate ORF with ORFannotate\n\nIsopedia interoperates with [ORFannotate](https://github.com/egustavsson/ORFannotate), which can consume Isopedia outputs to predict ORFs/UTRs and annotate CDS features.\n\n```text\n\u003cplaceholder\u003e\n``` --\u003e\n\n## Computational Resource Usage\n\nIsopedia 1.4.0 benchmark on 1,007 long-read transcriptome datasets from SRA and ENCODE.\n\n- Hardware: AMD Ryzen 9 7940HX, 64 GB RAM.\n\n| Step | Peak Mem (GB) | Time (H:MM:SS) |\n|---|---:|---|\n| `isopedia merge` | 54.77 | 28:26:08 |\n| `isopedia index` | 45.85 | 5:48:55 |\n| `isopedia isoform` (280K GENCODE v49 basic) | 9.19 | 4:52:18 |\n\n\n\n## FAQ\n\n**Q:** Can Isopedia be used in a \"transcript discovery\" mode — for example, to retrieve all transcripts indexed for a given gene or within a genomic region, without providing a GTF file in advance?\n\n**A:** Isopedia is designed as a genotyper rather than a caller. It always requires a GTF file as input, meaning users need to specify the transcript structures they want to query. Gene-level or coordinate-range discovery queries (e.g., \"retrieve all transcripts for gene N\" or \"all transcripts between coordinates A–B\") fall under the scope of a transcript caller, which is outside Isopedia's current functionality.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzhengxinchang%2Fisopedia","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzhengxinchang%2Fisopedia","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzhengxinchang%2Fisopedia/lists"}