{"id":25597080,"url":"https://github.com/hasindu2008/squigulator","last_synced_at":"2025-09-08T20:39:18.666Z","repository":{"id":85420953,"uuid":"518153674","full_name":"hasindu2008/squigulator","owner":"hasindu2008","description":"a tool for simulating nanopore raw signal data","archived":false,"fork":false,"pushed_at":"2025-08-25T11:00:46.000Z","size":31709,"stargazers_count":68,"open_issues_count":7,"forks_count":4,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-08-31T09:39:54.311Z","etag":null,"topics":["blow5","nanopore","signal","simulator","slow5"],"latest_commit_sha":null,"homepage":"https://hasindu2008.github.io/squigulator","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hasindu2008.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-07-26T17:20:41.000Z","updated_at":"2025-08-05T11:31:20.000Z","dependencies_parsed_at":"2023-06-19T17:08:45.605Z","dependency_job_id":"b4324b25-d23f-43aa-8f6b-92dd74568419","html_url":"https://github.com/hasindu2008/squigulator","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/hasindu2008/squigulator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hasindu2008%2Fsquigulator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hasindu2008%2Fsquigulator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hasindu2008%2Fsquigulator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hasindu2008%2Fsquigulator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hasindu2008","download_url":"https://codeload.github.com/hasindu2008/squigulator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hasindu2008%2Fsquigulator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274231504,"owners_count":25245601,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-08T02:00:09.813Z","response_time":121,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["blow5","nanopore","signal","simulator","slow5"],"created_at":"2025-02-21T12:47:41.686Z","updated_at":"2025-09-08T20:39:18.613Z","avatar_url":"https://github.com/hasindu2008.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# squigulator\n\n*squigulator* is a tool for simulating nanopore raw signal data. It is under development and there could be interface changes and changes to default parameters. Do not hesitate to open an [issue](https://github.com/hasindu2008/squigulator) if you found a bug, something is not clear or for any feature requests.\n\n*squigulator* uses traditional pore models and gaussian noise for simulation. Due to simplicity, simulation would not be perfect, but takes miniscule effort to setup and run. Generating 100,000 reads (~1 Gbases) from human genome using *squigulator* takes ~5 minutes with ~3 GB of RAM (8 CPU threads). For ~30X from the human genome (~9M reads, ~90Gbases) with 32 CPU threads, *squigulator* takes ~1 hour.\n\nReads directly extracted from the reference genome are simulated without any mutations/variants. If you want to have variants in your simulated data, you can first apply a set of variants to the reference using [bcftools](http://www.htslib.org/download/) and use that as the input to the *squigulator*.\n\nPublication: [https://doi.org/10.1101/gr.278730.123](https://genome.cshlp.org/content/34/5/778.full?sid=cd2c8aec-be46-4c9e-885c-8452ac069f64) \u003cbr/\u003e\nPreprint: [https://www.biorxiv.org/content/10.1101/2023.05.09.539953v1](https://www.biorxiv.org/content/10.1101/2023.05.09.539953v1)\u003cbr/\u003e\nSLOW5 ecosystem: [https://hasindu2008.github.io/slow5](https://hasindu2008.github.io/slow5)\u003cbr/\u003e\n\n![squigulator](docs/img/example.svg)\n\n[![GitHub Downloads](https://img.shields.io/github/downloads/hasindu2008/squigulator/total?logo=GitHub)](https://github.com/hasindu2008/squigulator/releases)\n[![BioConda Install](https://img.shields.io/conda/dn/bioconda/squigulator?label=BioConda)](https://anaconda.org/bioconda/squigulator)\n[![x86_64](https://github.com/hasindu2008/squigulator/actions/workflows/c-cpp.yml/badge.svg)](https://github.com/hasindu2008/squigulator/actions/workflows/c-cpp.yml)\n\n\nPlease cite the following in your publications when using *squigulator*:\n\n\u003e Gamaarachchi, H., Ferguson, J. M., Samarakoon, H., Liyanage, K., \u0026 Deveson, I. W. (2024). Simulation of nanopore sequencing signal data with tunable parameters. Genome Research, 34(5), 778-783.\n\n```\n@article{gamaarachchi2023squigulator,\n  title={Squigulator: simulation of nanopore sequencing signal data with tunable noise parameters},\n  author={Gamaarachchi, Hasindu and Ferguson, James M and Samarakoon, Hiruna and Liyanage, Kisaru and Deveson, Ira W},\n  journal={bioRxiv},\n  pages={2023--05},\n  year={2023},\n  publisher={Cold Spring Harbor Laboratory}\n}\n```\n\n## Background story\n\n*squigulator* started as *ssssim* (Stupidly Simple Signal Simulator). For an experiment, [kisarur](https://github.com/kisarur) wanted some simulated data. After [hiruna72](https://github.com/hiruna72) trying ~3 days to get an existing simulator installed (dependency and compatibility issues), I thought that writing a simple tool from scratch is easier. Indeed, that is when writing BLOW5 files. Writing over complicated formats like FAST5 or POD5 would consume months and I would not think about writing a simulator in the first place then.\n\nAfter getting the basic *ssssim* implemented in ~8 hours and successfully basecalling using [buttery-eel](https://github.com/Psy-Fer/buttery-eel), I realised that it has worked much better than anticipated. Then, I decided to extend it with different features and options. The result is *sigsim* which was eventually named as *squigulator*, a cool name suggested by [IraDeveson](https://github.com/IraDeveson).\n\n## Installation\n\nFor x86-64 Linux, you can use the precompiled binaries under [releases](https://github.com/hasindu2008/squigulator/releases):\n\n```\nVERSION=0.4.0\nwget https://github.com/hasindu2008/squigulator/releases/download/v${VERSION}/squigulator-v${VERSION}-x86_64-linux-binaries.tar.gz\ntar xf squigulator-v${VERSION}-x86_64-linux-binaries.tar.gz  \u0026\u0026 cd squigulator-v${VERSION}\n./squigulator --help\n```\n\nTo build squigulator you need a C compiler that supports C99 standard (with X/Open 7 POSIX 2008 extensions):\n\n```\nsudo apt-get install zlib1g-dev   #install zlib development libraries\ngit clone https://github.com/hasindu2008/squigulator # alternatively download a release tarball from under https://github.com/hasindu2008/squigulator/releases/ and extract\ncd squigulator\nmake\n```\n\nThe commands to install zlib __development libraries__ on some popular distributions :\n```sh\nOn Debian/Ubuntu : sudo apt-get install zlib1g-dev\nOn Fedora/CentOS : sudo dnf/yum install zlib-devel\nOn OS X : brew install zlib\n```\n\n## Usage\n\nThe simplest command to generate reads:\n```\nsquigulator [OPTIONS] ref_genome.fa -o out_signal.blow5 -n NUM_READS\n```\n\nBy default, DNA PromethION reads (R9.4.1) will be simulated. Specify the `-x STR` option to set a different profile from the following available pre-sets (see [here](docs/profile.md) for more info).\n- `dna-r9-min`: genomic DNA on MinION R9.4.1 flowcells\n- `dna-r9-prom`: genomic DNA on PromethION R9.4.1 flowcells\n- `rna-r9-min`: direct RNA on MinION R9.4.1 flowcells\n- `rna-r9-prom`: direct RNA on PromethION R9.4.1 flowcells\n- `dna-r10-min`: genomic DNA on MinION R10.4.1 flowcells\n- `dna-r10-prom`: genomic DNA on PromethION R10.4.1 flowcells\n- `rna004-min`: direct RNA on MinION RNA004 flowcells\n- `rna004-prom`: direct RNA on promethION RNA004 flowcells\n\nIf a genomic DNA profile is selected, the input reference must be the **reference genome in *FASTA* format**. *squigulator* will randomly sample the genome from a uniform distribution and generate reads whose lengths are from a gamma distribution (based on `-r`). If a direct RNA profile is selected, the input reference must be the **transcriptome in *FASTA* format**. For RNA, *squigulator* will randomly pick transcripts from a uniform distribution and the whole transcript length is simulated.\n\nYou can basecall the generated raw signal directly from the [BLOW5 format](https://www.nature.com/articles/s41587-021-01147-4) using the SLOW5 Guppy wrapper called [buttery-eel](https://github.com/Psy-Fer/buttery-eel) or our fork of [dorado basecaller](https://github.com/hiruna72/dorado/releases/tag/v0.0.1).  Alternatively, if you love FAST5 that much, use [slow5tools](https://github.com/hasindu2008/slow5tools) to convert the BLOW5 to FAST5 and then use original Guppy basecaller.\n\nGenerated read IDs encode the true mapping positions in a format like `S1_33!chr1!225258409!225267761!-`, which is compatible with [*mapeval* command in *paftools.js* under Minimap2 repository](https://github.com/lh3/minimap2/blob/master/misc/README.md#evaluation). Mapping positions are 0-based (BED like) coordinates.\n\nVisit the [manual page](docs/man.md) for details of each and every option.\n\n## Examples\n\nDNA examples:\n\n```\n# generate 150,000 PromethION DNA reads from a reference genome\nsquigulator hg38noAlt.fa -x dna-r9-prom -o reads.blow5 -n 150000\n\n# generate 30,000 MinION ultra-long DNA reads with mean readlength of around 50,000 bases\nsquigulator hg38noAlt.fa -x dna-r9-min -o reads.blow5 -n 30000 -r 50000\n\n# generate 1000 PromethION DNA reads with perfect signals with no noise\nsquigulator hg38noAlt.fa -x dna-r9-prom -o reads.blow5 -n 1000 --ideal\n\n# simulate signals for basecalled reads (each complete read will be simulated; not memory optimised yet, will load the while basecalled.fq to memory first)\nsquigulator basecalled.fq -x dna-r9-prom -o reads.blow5 --full-contigs\n\n# simulate R10 chemistry PromethION DNA reads at 30X fold coverage\nsquigulator hg38noAlt.fa -x dna-r10-prom -o reads.blow5 -f 30\n```\n\nRNA examples:\n```\n# generate 4000 PromethION direct RNA reads from a transcriptome while including the adaptor and polyA tail\nsquigulator gencode.v40.transcripts.fa -x rna-r9-prom -o reads.blow5 -n 4000 --prefix\n\n# simulate signals for basecalled reads (each complete read will be simulated; not memory optimised yet, will load the whole basecalled.fq to memory first)\nsquigulator basecalled.fq -x dna-r9-prom -o reads.blow5 --full-contigs\n```\n\nDNA example with variants that requires [bcftools](http://www.htslib.org/download/):\n\n```\n# ploidy 1; coronavirus (reference ~30,000 bases) at ~500X depth with mean readlength of around 300 bases (approximately 30,000*500/300=50,000 reads); apply some variants\nbcftools consensus -f nCoV-2019.reference.fasta alpha.vcf -o alpha.fa\n# squigulator alpha.fa -x dna-r9-prom -o reads.blow5 -f 500 # before squigulator v0.2: squigulator alpha.fa -x dna-r9-prom -o reads.blow5 -n 50000 -r 300\n\n# ploidy 2; chr22 (reference ~50,000,000 bases) at ~30X depth with mean readlength of around 10,000 bases (approximately 50,000,000*30/10,000=150,000 reads); apply na12878 truthset from genome in a bottle consortium\n\nbcftools consensus -H 1 -f hg38noAlt_chr22.fa na12878_chr22.vcf.gz -o na12878_chr22_1.fa\nbcftools consensus -H 2 -f hg38noAlt_chr22.fa na12878_chr22.vcf.gz -o na12878_chr22_2.fa\ncat na12878_chr22_1.fa na12878_chr22_2.fa \u003e na12878_chr22.fa\nsquigulator na12878_chr22.fa -x dna-r9-prom -o reads.blow5 -f 30 #before squigulator v0.2: squigulator na12878_chr22.fa -x dna-r9-prom -o reads.blow5 -n 150000 -r 10000\n```\n\n## Acknowledgement\n\nR9 pore-models are from [Nanopolish](https://github.com/jts/nanopolish) and R10 pore-models derived from [here](https://github.com/nanoporetech/kmer_models).\nSome code snippets have been taken from [Minimap2](https://github.com/lh3/minimap2), [Samtools](http://samtools.sourceforge.net/).\nKseq from [klib](https://github.com/attractivechaos/klib) is used.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhasindu2008%2Fsquigulator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhasindu2008%2Fsquigulator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhasindu2008%2Fsquigulator/lists"}