{"id":13756341,"url":"https://github.com/pangenome/impg","last_synced_at":"2025-12-30T09:24:20.097Z","repository":{"id":222232188,"uuid":"755332339","full_name":"pangenome/impg","owner":"pangenome","description":"implicit pangenome graph","archived":false,"fork":false,"pushed_at":"2025-05-06T23:14:49.000Z","size":534,"stargazers_count":58,"open_issues_count":5,"forks_count":5,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-05-07T00:15:16.692Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pangenome.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-02-09T22:50:23.000Z","updated_at":"2025-05-06T23:14:32.000Z","dependencies_parsed_at":"2024-02-13T04:29:00.852Z","dependency_job_id":"77e439f9-58ac-44d7-9ccd-88bd79ee4617","html_url":"https://github.com/pangenome/impg","commit_stats":null,"previous_names":["ekg/impg"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pangenome%2Fimpg","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pangenome%2Fimpg/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pangenome%2Fimpg/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pangenome%2Fimpg/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pangenome","download_url":"https://codeload.github.com/pangenome/impg/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253358069,"owners_count":21895967,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T11:00:42.468Z","updated_at":"2025-12-30T09:24:20.091Z","avatar_url":"https://github.com/pangenome.png","language":"Rust","funding_links":[],"categories":["A list of software capable of analyzing **eukaryotic** genomes for pangenomics"],"sub_categories":[],"readme":"# impg: implicit pangenome graph\n\n[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/impg/README.html)\n\n## Why impg?\n\nStudying genomic variation at specific loci—disease genes, regulatory elements, structural variants—across populations or species traditionally requires either building expensive whole-genome graphs or using reference-based methods that miss variation. `impg` solves this by treating all-vs-all pairwise alignments as an *implicit pangenome graph*, rapidly projecting target ranges through the alignment network to extract only the homologous sequences you need. Query regions across many genomes in seconds. Perform transitive searches to discover connected sequences. Partition genomes into comparable loci. Refine regions to maximize sample coverage—all without constructing explicit graph structure. This makes pangenome-scale comparative genomics fast and practical.\n\n## Usage\n\nHere's a basic example:\n\n```bash\nimpg query -a cerevisiae.pan.paf.gz -r S288C#1#chrI:50000-100000 -x\n```\n\n- `-a` specifies the path to the alignment file in PAF or .1aln format. PAF files must use CIGAR strings with `=` for matches and `X` for mismatches (e.g., from `wfmash` or `minimap2 --eqx`).\n- `-r` defines the target range in the format of `seq_name:start-end`\n- `-x` requests a *transitive closure* of the matches. That is, for each collected range, we then find what sequence ranges are aligned onto it. This is done progressively until we've closed the set of alignments connected to the initial target range.\n\nDepending on your alignments, this might result in the following BED file:\n\n```txt\nS288C#1#chrI        50000  100000\nDBVPG6044#1#chrI    35335  85288\nY12#1#chrI          36263  86288\nDBVPG6765#1#chrI    36166  86150\nYPS128#1#chrI       47080  97062\nUWOPS034614#1#chrI  36826  86817\nSK1#1#chrI          52740  102721\n```\n\n## Installation\n\nYou need Rust (`cargo`) installed. Then:\n\n```bash\ngit clone https://github.com/pangenome/impg.git\ncd impg\ncargo install --force --path .\n```\n\nTo install without AGC support (using only FASTA files):\n\n```bash\ngit clone https://github.com/pangenome/impg.git\ncd impg\ncargo install --force --path . --no-default-features\n```\n\n### Troubleshooting\n\nIf you encounter issues related to `libclang` during the build process, you may need to set specific environment variables to point to your LLVM installation. \n\n```shell\nenv -i HOME=\"$HOME\" PATH=\"/usr/local/bin:/usr/bin:/bin:$HOME/.cargo/bin\" LIBCLANG_PATH=\"/usr/lib/llvm-7/lib\" BINDGEN_EXTRA_CLANG_ARGS=\"-I/usr/lib/llvm-7/lib/clang/7.0.1/include\" bash -c 'cargo build --release'\n```\n\nAlternatively, install from Bioconda:\n\n```bash\nconda install -c bioconda impg\n```\n\n## Building with GNU Guix\n\n**NOTE**: All paths are relative to the repository root. If you are not working\nin the repository root, please update the paths to fit your work scenario.\n\nTo build `impg` with guix without making it available, run the following\ncommand:\n\n```sh\nguix build -L .guix/modules --file=guix.scm\n```\n\nThe `-L` option adds the `.guix/modules` directory to the front of the guile\nload path. The `--file` option points to the `guix.scm` at the root of the\nrepository.\n\nTo build and \"install\" `impg` with guix, run:\n\n```sh\nguix install -L .guix/modules --file=guix.scm\n```\n\n## Commands\n\n### Query\n\nQuery overlaps in the alignment:\n\n```bash\n# Query a single region\nimpg query -a alignments.paf -r chr1:1000-2000\n\n# Query a whole sequence by name\nimpg query -a alignments.paf -r chr1\n\n# Query multiple regions from a BED file (mix PAF and .1aln)\nimpg query -a file1.paf file2.1aln -b regions.bed\n\n# Enable transitive overlap search\nimpg query -a alignments.paf -r chr1:1000-2000 -x\n\n# Set maximum transitive depth (default: 0 = unlimited)\nimpg query -a alignments.1aln -r chr1:1000-2000 -x -m 3\n\n# Filter by minimum gap-compressed identity\nimpg query -a alignments.paf alignments.1aln -r chr1:1000-2000 --min-identity 0.9\n\n# Output formats (auto/bed/bedpe/paf/gfa/maf/fasta/fasta+paf/fasta-aln)\nimpg query -a alignments.paf -r chr1:1000-2000 -o bed\nimpg query -a alignments.1aln -r chr1:1000-2000 -o bedpe\nimpg query -a file1.paf file2.1aln -b chr1:1000-2000 -o paf\n\n# Write output to file instead of stdout (using -O / --output-prefix)\nimpg query -p alignments.paf -r chr1:1000-2000 -o bed -O results       # creates results.bed\n\n# gfa/maf/fasta output requires sequence files (--sequence-files or --sequence-list)\nimpg query -a alignments.paf -r chr1:1000-2000 -o gfa --sequence-files ref.fa genomes.fa\nimpg query -a alignments.1aln -r chr1:1000-2000 -o maf --sequence-list fastas.txt\nimpg query -a file1.paf file2.1aln -r chr1:1000-2000 -o fasta --sequence-files *.fa\n\n# fasta+paf combines FASTA and PAF output\nimpg query -a alignments.paf -r chr1:1000-2000 -o fasta+paf --sequence-files *.fa\n\n# fasta-aln outputs POA-based FASTA alignment\nimpg query -a alignments.1aln -r chr1:1000-2000 -o fasta-aln --sequence-files *.fa\n\n# Works with AGC archives too\nimpg query -a alignments.paf -r chr1:1000-2000 -o gfa --sequence-files genomes.agc\n\n# fasta output with reverse complement for reverse strand sequences\nimpg query -a alignments.1aln -r chr1:1000-2000 -o fasta --sequence-files *.fa --reverse-complement\n\n# Merge nearby regions (default: 0)\nimpg query -a file1.paf file2.1aln -r chr1:1000-2000 -d 1000\n\n# Filter results by minimum length\nimpg query -a alignments.paf -r chr1:1000-2000 -l 5000\n\n# Use DFS instead of BFS for transitive search (slower but fewer overlapping results)\nimpg query -a alignments.paf -r chr1:1000-2000 -x --transitive-dfs\n\n# Fast approximate mode for .1aln files (bed/bedpe only)\nimpg query -a alignments.1aln -r chr1:1000-2000 --approximate\n\n# Fast approximate mode with transitive queries (requires --min-transitive-len \u003e trace_spacing)\nimpg query -a alignments.1aln -r chr1:1000-2000 --approximate -x --min-transitive-len 101\n\n# Restrict results to sequences listed in a file (one name per line)\n# Also filters intermediate steps during transitive queries\nimpg query -a alignments.paf -r chr1:1000-2000 -x --subset-sequence-list sequences.txt\n\n# Transform coordinates back to original sequences when using subsequence inputs (seq_name:start-end)\nimpg query -a alignments.paf -r chr1:1000-2000 -o paf --original-sequence-coordinates\n\n# Disable merging entirely\nimpg query -a alignments.paf -r chr1:1000-2000 --no-merge\n\n# Force processing large regions (\u003e10kbp) with gfa/maf output\nimpg query -a alignments.paf -r chr1:1000-50000 -o gfa --sequence-files *.fa --force-large-region\n```\n\n#### Alignment visualizations\n\nThe `scripts/faln2html.py` tool converts FASTA alignments into interactive HTML visualizations that can be viewed in any web browser. It supports [react-msa](https://github.com/GMOD/JBrowseMSA) and [ProSeqViewer](https://github.com/BioComputingUP/ProSeqViewer) as MSA viewers.\n\n```bash\n# Visualize FASTA alignments in the browser (pipe directly to visualization script)\nimpg query -a alignments.paf -r chr1:1000-2000 -o fasta-aln --sequence-files *.fa | \\\n  python scripts/faln2html.py -i - -o alignment.html\n\n# Choose visualization tool (reactmsa or proseqviewer)\nimpg query -a alignments.paf -r chr1:1000-2000 -o fasta-aln --sequence-files *.fa | \\\n  python scripts/faln2html.py -i - -o alignment.html --tool proseqviewer\n```\n\n### Partition\n\nPartition the alignment into smaller pieces:\n\n```bash\n# Basic partitioning with 1Mb windows (outputs single partitions.bed file with partition number in 4th column)\nimpg partition -a alignments.paf -w 1000000\n\n# Output separate files for each partition\nimpg partition -a alignments.1aln -w 1000000 --separate-files\n\n# Specify output folder for partition files (directory will be created if it doesn't exist)\nimpg partition -a file1.paf file2.1aln -w 1000000 --output-folder results\n\n# Start from specific sequences (one per line)\nimpg partition -a alignments.paf -w 1000000 --starting-sequences-file seqs.txt\n\n# Merge nearby intervals within partitions\nimpg partition -a file1.paf file2.1aln -w 1000000 -d 10000\n\n# Selection strategies for next sequence\nimpg partition -a alignments.paf -w 1000000 --selection-mode longest        # longest missing region\nimpg partition -a alignments.1aln -w 1000000 --selection-mode total          # most total missing\nimpg partition -a alignments.paf -w 1000000 --selection-mode sample         # by sample (PanSN)\nimpg partition -a alignments.1aln -w 1000000 --selection-mode haplotype      # by haplotype (PanSN)\n\n# Control transitive search depth and minimum region size\nimpg partition -a file1.paf file2.1aln -w 1000000 -m 2 --min-transitive-len 10000\n\n# Approximate mode with partition (requires --min-transitive-len \u003e trace_spacing)\nimpg partition -a alignments.1aln -w 1000000 --approximate --min-transitive-len 101 -o bed\n# Output as GFA, MAF or FASTA requires sequence files and --separate-files flag\nimpg partition -a alignments.paf -w 1000000 -o gfa --sequence-files *.fa --separate-files --output-folder gfa_partitions\nimpg partition -a alignments.1aln -w 1000000 -o maf --sequence-list fastas.txt --separate-files --output-folder maf_partitions\nimpg partition -a file1.paf file2.1aln -w 1000000 -o fasta --sequence-files *.fa --separate-files --output-folder fasta_partitions\n# Works with AGC archives too\nimpg partition -a alignments.paf -w 1000000 -o gfa --sequence-files genomes.agc --separate-files --output-folder gfa_partitions\n```\n\n### Similarity\n\nCompute pairwise similarity between sequences in a region:\n\n```bash\n# Basic similarity computation\nimpg similarity -a alignments.paf -r chr1:1000-2000 --sequence-files ref.fa genomes.fa\n\n# Query multiple regions from a BED file (it produces multiple similarity matrices)\nimpg similarity -a alignments.1aln -b regions.bed --sequence-files *.fa\n\n# Output distances instead of similarities\nimpg similarity -a file1.paf file2.1aln -r chr1:1000-2000 --sequence-files *.fa --distances\n\n# Include all pairs (even those with zero similarity)\nimpg similarity -a alignments.paf -r chr1:1000-2000 --sequence-files *.fa -a\n\n# Restrict analysis to sequences listed in a file (one name per line)\n# Entries may be full contig names or sample identifiers (e.g., HG00097 or HG00097_hap1)\nimpg similarity -a alignments.1aln -r chr1:1000-2000 --sequence-files *.fa --subset-sequence-list sequences.txt\n\n# Group sequences by delimiter (e.g., for PanSN naming, \"sample#haplotype#chr\" -\u003e \"sample\")\nimpg similarity -a alignments.paf -r chr1:1000-2000 --sequence-files *.fa --delim '#'\n\n# Use 2nd occurrence of delimiter for grouping (e.g., for PanSN naming, \"sample#haplotype#chr\" -\u003e \"sample#haplotype\")\nimpg similarity -a alignments.1aln -r chr1:1000-2000 --sequence-files *.fa --delim '#' --delim-pos 2\n\n# Perform PCA/MDS dimensionality reduction\nimpg similarity -a file1.paf file2.1aln -r chr1:1000-2000 --sequence-files *.fa --pca\n\n# Specify number of PCA components (default: 2)\nimpg similarity -a alignments.paf -r chr1:1000-2000 --sequence-files *.fa --pca --pca-components 3\n\n# Choose similarity measure for PCA distance matrix (jaccard/cosine/dice, default: jaccard)\nimpg similarity -a alignments.1aln -r chr1:1000-2000 --sequence-files *.fa --pca --pca-measure cosine\n\n# PCA with adaptive polarization using previous regions\nimpg similarity -a file1.paf file2.1aln -b regions.bed --sequence-files *.fa --pca --polarize-n-prev 3\n\n# PCA with sample-guided polarization\nimpg similarity -a alignments.paf -b regions.bed --sequence-files *.fa --pca --polarize-guide-samples sample1,sample2\n```\n\n### Refine\n\nRefine loci to maximize sample support:\n\n```bash\n# Refine a single region to maximize the number of sequences spanning both ends\nimpg refine -a alignments.paf -r chr1:1000-2000\n\n# Refine many regions from a BED file\nimpg refine -a alignments.paf -b loci.bed\n\n# Allow merging within 200 kb and require at least 2 kb coverage near each end\nimpg refine -a alignments.paf -r chr1:1000-2000 -d 200000 --span-bp 2000\n\n# Expand up to 90% of the locus length on each side (default: 0.5)\nimpg refine -a alignments.paf -r chr1:1000-2000 --max-extension 0.90\n\n# Or cap the search to an absolute flank size\nimpg refine -a alignments.paf -r chr1:1000-2000 --max-extension 50000\n\n# Maximize PanSN sample or haplotype counts instead of sequence counts\nimpg refine -a alignments.paf -r chr1:1000-2000 --pansn-mode sample\nimpg refine -a alignments.paf -r chr1:1000-2000 --pansn-mode haplotype\n\n# Capture the supporting entities in a separate BED file\nimpg refine -a alignments.paf -r chr1:1000-2000 --support-output refine_support.bed\n\n# Control extension step size (default: 1000 bp)\nimpg refine -a alignments.paf -r chr1:1000-2000 --extension-step 500\n\n# Exclude regions from entity counting via blacklist\nimpg refine -a alignments.paf -r chr1:1000-2000 --blacklist-bed excluded.bed\n\n# Restrict to specific sequences (filters transitive steps too)\nimpg refine -a alignments.paf -r chr1:1000-2000 --subset-sequence-list samples.txt\n\n# Works with .1aln files too (requires --sequence-files)\nimpg refine -a alignments.1aln --sequence-files sequences.fa -r chr1:1000-2000\n\n# Fast approximate mode for .1aln files (requires --min-transitive-len \u003e trace_spacing)\nimpg refine -a alignments.1aln -r chr1:1000-2000 --approximate --min-transitive-len 101\n```\n\nWhen `--support-output` is provided, the tool emits a BED file listing every sequence/sample/haplotype that spans the refined region: `sequence\tstart\tend\tregion-name`.\n\n`impg refine` explores asymmetric left/right expansions around each target region to find the smallest window that maximizes the number of sequences, samples, or haplotypes. Keeping start/end alignment anchors outside structural variants helps avoid selecting loci that terminate inside large insertions or deletions.\n\n### Stats\n\nPrint alignment statistics:\n\n```bash\n# Get statistics for PAF file\nimpg stats -a alignments.paf\n\n# Get statistics for .1aln file\nimpg stats -a alignments.1aln\n\n# Get statistics for combined PAF and .1aln files\nimpg stats -a file1.paf file2.1aln\n```\n\n### Lace\n\nCombine multiple GFA or VCF files:\n\n```bash\n# Combine multiple GFA files (auto-detects format)\nimpg lace -f file1.gfa file2.gfa file3.gfa -o combined.gfa\n\n# Combine multiple VCF files\nimpg lace -f file1.vcf file2.vcf file3.vcf -o combined.vcf\n\n# Use a list file containing file paths\nimpg lace -l files.txt -o combined.gfa\n\n# Explicitly specify input format (gfa, vcf, auto)\nimpg lace -f *.gfa -o combined.gfa --format gfa\n\n# Fill gaps between contiguous path segments (GFA only)\nimpg lace -f *.gfa -o combined.gfa --fill-gaps 1 # Fill with N's\nimpg lace -f *.gfa -o combined.gfa --fill-gaps 1 --sequence-files sequence.fa # Fill with sequences\n\n# Fill all gaps, including start and end gaps (GFA only, requires sequence files)\nimpg lace -f *.gfa -o combined.gfa --fill-gaps 2 --sequence-files sequence.fa\n\n# Control output compression\nimpg lace -f *.gfa -o combined.gfa.gz --compress gzip\nimpg lace -f *.gfa -o combined.gfa.bgz --compress bgzip\nimpg lace -f *.gfa -o combined.gfa.zst --compress zstd\n\n# Use reference for VCF contig validation\nimpg lace -f *.vcf -o combined.vcf --reference reference.fa\n\n# Use custom temporary directory\nimpg lace -f *.gfa -o combined.gfa --temp-dir /tmp/lace_work\n```\n\n#### Path Name Format\n\nThe command expects path names in the format:\n\n```\nNAME:START-END\n```\n\nExample: `HG002#1#chr20:1000-2000`\n\nThe command uses these coordinates to:\n1. Identify which sequences belong together\n2. Order the sequences correctly\n3. Detect and handle overlaps or gaps\n\nNote: `NAME` can contain ':' characters. When parsing coordinates, the command uses the last occurrence of ':' to separate the name from the coordinate range.\n\n#### Post-processing recommendations\n\nAfter combining the GFA files, the resulting graph will already have compacted node IDs ranging from `1` to the total number of nodes. However, it is strongly recommended to perform post-processing steps using **[ODGI](https://github.com/pangenome/odgi)** to unchop and sort the graph.\n\n```bash\nodgi unchop -i combined.gfa -o - -t 16 | \\\n    odgi sort -i - -o - -p gYs -t 16 | \\\n    odgi view -i - -g \u003e combined.final.gfa\n```\n\nIf overlaps were present, and then trimmed during the merging process, it's advisable to run **[GFAffix](https://github.com/marschall-lab/GFAffix)** before the ODGI pipeline to remove redundant nodes introduced by the overlap trimming.\n\n```bash\ngfaffix combined.gfa -o combined.fix.gfa \u0026\u003e /dev/null\n\nodgi unchop -i combined.fix.gfa -o - -t 16 | \\\n    odgi sort -i - -o - -p gYs -t 16 | \\\n    odgi view -i - -g \u003e combined.final.gfa\n```\n\n### Index\n\nCreate an IMPG index from alignment files:\n\n```bash\n# Index a single PAF file\nimpg index -a alignments.paf\n\n# Index a single .1aln file\nimpg index -a alignments.1aln\n\n# Index multiple alignment files (PAF and .1aln mixed)\nimpg index -a file1.paf file2.1aln file3.paf\n\n# Create index with custom name\nimpg index -a alignments.paf -i custom.impg\n\n# Index from a list of alignment files (can mix formats)\nimpg index --alignment-list alignment_files.txt\n```\n\n#### Indexing Modes\n\n**Combined index** (default): Creates a single `.impg` file for all alignments.\n```bash\nimpg index -a file1.paf file2.1aln -i combined.impg\nimpg query -i combined.impg -r chr1:0-1000\n```\n\n**Per-file index** (`--per-file-index`): Creates one `.impg` per alignment file (e.g., `data.paf.impg`).\n```bash\nimpg index --alignment-list files.txt --per-file-index -t 32\nimpg query --alignment-list files.txt --per-file-index -r chr1:0-1000\n```\n\nBoth modes work with PAF and .1aln files (can be mixed in `--alignment-list`).\n\n**When to use per-file indexing:**\n- Incremental updates (only rebuild changed alignment files)\n- Many alignment files\n\n**Stale index detection:** impg warns if alignment files are modified after index creation. Use `-f/--force-reindex` to rebuild.\n\n**Note on compressed files**: `impg` works directly with bgzip-compressed PAF files (`.paf.gz`, `.paf.bgz`). For large files, creating a GZI index can speed up initial index creation:\n\n```bash\nbgzip -r alignments.paf.gz  # Creates alignments.paf.gz.gzi (optional)\n```\n\n### Common options\n\nAll commands support these options:\n- `-a, --alignment-files`: One or more paths to alignment files in PAF or .1aln format (can be mixed). Files can be gzipped or uncompressed.\n- `--alignment-list`: Path to a plain-text file listing one alignment path per line (PAF or .1aln files can be mixed).\n- `-i, --index`: Path to an existing IMPG index file.\n- `-f, --force-reindex`: Always regenerate the IMPG index even if it already exists.\n- `-t, --threads`: Number of threads (default: 4)\n- `-v, --verbose`: Verbosity level (0=error/silent, 1=info with progress bar, 2=debug)\n\n### Sequence file options\n\nFor GFA/MAF/FASTA output and similarity computation:\n\n- `--sequence-files`: List of sequence files (FASTA or AGC*)\n- `--sequence-list`: Text file listing sequence files (FASTA or AGC*) (one per line)\n- `--poa-scoring`: POA scoring parameters as `match,mismatch,gap_open1,gap_extend1,gap_open2,gap_extend2` (default: `1,4,6,2,26,1`)\n- `--reverse-complement`: Reverse complement sequences on the reverse strand (for FASTA output)\n\n*AGC files are only supported in the full installation (default features). For FASTA-only support, install with `--no-default-features`.\n\n### Merging behaviour\n\n- `-d, --merge-distance \u003cINT\u003e`: Merge nearby hits within this distance (bp).\n- `--no-merge`: Disable merging entirely for all output formats.\n- `--consider-strandness`: Keep forward and reverse strands separate when merging. By default, strands are merged for BED/GFA/MAF outputs and kept separate for FASTA/FASTA-ALN.\n\n## What does `impg` do?\n\nAt its core, `impg` lifts over ranges from a target sequence (used as reference) into the queries (the other sequences aligned to the sequence used as reference) described in alignments.\nIn effect, it lets us pick up homologous loci from all genomes mapped onto our specific target region.\nThis is particularly useful when you're interested in comparing a specific genomic region across different individuals, strains, or species in a pangenomic or comparative genomic setting.\nThe output is provided in BED, BEDPE and PAF formats, making it straightforward to use to extract FASTA sequences for downstream use in multiple sequence alignment (like `mafft`) or pangenome graph building (e.g., `pggb` or `minigraph-cactus`).\n\n## How does it work?\n\n`impg` uses [`coitrees`](https://github.com/dcjones/coitrees) (Cache Oblivious Interval Trees) to provide efficient range lookup over the input alignments.\nCIGAR strings are converted to a compact delta encoding.\nThis approach allows for fast and memory-efficient projection of sequence ranges through alignments.\n\n## Authors\n\nAndrea Guarracino \u003caguarra1@uthsc.edu\u003e \\\nBryce Kille \u003cbrycekille@gmail.com\u003e \\\nErik Garrison \u003cerik.garrison@gmail.com\u003e\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpangenome%2Fimpg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpangenome%2Fimpg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpangenome%2Fimpg/lists"}