{"id":13704143,"url":"https://github.com/lh3/wgsim","last_synced_at":"2025-04-09T16:13:26.072Z","repository":{"id":1337301,"uuid":"1283203","full_name":"lh3/wgsim","owner":"lh3","description":"Reads simulator","archived":false,"fork":false,"pushed_at":"2021-09-03T14:58:22.000Z","size":131,"stargazers_count":273,"open_issues_count":21,"forks_count":90,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-04-09T16:13:19.082Z","etag":null,"topics":["bioinformatics","genomics"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lh3.png","metadata":{"files":{"readme":"README","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2011-01-22T22:57:48.000Z","updated_at":"2025-04-09T00:19:37.000Z","dependencies_parsed_at":"2022-08-06T10:16:11.042Z","dependency_job_id":null,"html_url":"https://github.com/lh3/wgsim","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fwgsim","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fwgsim/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fwgsim/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lh3%2Fwgsim/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lh3","download_url":"https://codeload.github.com/lh3/wgsim/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248065282,"owners_count":21041872,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","genomics"],"created_at":"2024-08-02T21:01:04.701Z","updated_at":"2025-04-09T16:13:26.048Z","avatar_url":"https://github.com/lh3.png","language":"C","funding_links":[],"categories":["Next Generation Sequencing","Ranked by starred repositories"],"sub_categories":["Variant Simulation"],"readme":"Introduction\n============\n\nWgsim is a small tool for simulating sequence reads from a reference genome.\nIt is able to simulate diploid genomes with SNPs and insertion/deletion (INDEL)\npolymorphisms, and simulate reads with uniform substitution sequencing errors.\nIt does not generate INDEL sequencing errors, but this can be partly\ncompensated by simulating INDEL polymorphisms.\n\nWgsim outputs the simulated polymorphisms, and writes the true read coordinates\nas well as the number of polymorphisms and sequencing errors in read names.\nOne can evaluate the accuracy of a mapper or a SNP caller with wgsim_eval.pl\nthat comes with the package.\n\n\nCompilation\n===========\n\ngcc -g -O2 -Wall -o wgsim wgsim.c -lz -lm\n\n\nHistory\n=======\n\nWgsim was modified from MAQ's read simulator by dropping dependencies to other\nsource codes in the MAQ package and incorporating patches from Colin Hercus\nwhich allow to simulate INDELs longer than 1bp. Wgsim was originally released\nin the SAMtools software package. I forked it out in 2011 as a standalone\nproject. A few improvements were also added in this course.\n\n\nEvaluation\n==========\n\nSimulation and evaluation\n-------------------------\n\nThe command line for simulation:\n\n  wgsim -Nxxx -1yyy -d0 -S11 -e0 -rzzz hs37m.fa yyy-zzz.fq /dev/null\n\nwhere yyy is the read length, zzz is the error rate and $xxx * $yyy = 10000000.\nBy default, 15% of polymorphisms are INDELs and their lengths are drawn from a\ngeometric distribution with density 0.7*0.3^{l-1}.\n\nThe command line for evaluation:\n\n  wgsim_eval.pl unique aln.sam | wgsim_eval.pl alneval -g 20\n\nThe '-g' option may be changed with mappers.\n\n\nSystem\n------\n\nGCC: 4.1.2\nCPU: AMD Opteron 8350 @ 2.0GHz\nMem: 128GB\n\n\nResults\n-------\n\n==================================================================================================================\n                          100bp              200bp              500bp              1000bp            10000bp\n                   ------------------  -----------------  -----------------  -----------------  -----------------\n Program  Metrics     2%    5%   10%     2%    5%   10%     2%    5%   10%     2%    5%   10%     2%    5%   10%\n------------------------------------------------------------------------------------------------------------------\n            CPU      249   198   136    325   262   163    332   243   232    320   235   215    235   197   189\n BWA-SW     Q20%    85.1  63.6  21.4   93.7  88.9  53.5   96.4  95.7  89.2   96.6  96.2  95.1   97.7  98.3  97.7\n            err%    0.01  0.06  0.20   0.00  0.01  0.14   0.00  0.01  0.01   0.00  0.00  0.01   0.00  0.00  0.00\n            one%    94.6  77.4  35.7   97.5  95.1  67.6   98.6  98.5  93.4   99.0  98.9  98.3   99.7  99.8  99.7\n------------------------------------------------------------------------------------------------------------------\n            CPU                                            302   484  1060    330   352   607    381   480   919\n AGILE      Q20%                                          98.6  98.4  98.4   98.4  98.4  98.6   98.2  98.6  99.3\n            err%                                          0.66  0.69  2.31   0.34  0.40  0.70   0.10  0.00  0.20\n            one%                                           100  99.4     0    100   100   100    100   100   100\n==================================================================================================================\n\n1) AGILE throws \"Floating point exception\" halfway for 100/200bp reads.  The\n   default output is supposed to be PSL, but actually has an additional \"score\"\n   column. AGILE is reportedly faster than BWA-SW for 1000bp reads. It is\n   slower here possibly because of suboptimal command line options.\n\n2) Gassst uses over 27GB memory in 20 minutes. The memory then quickly\n   increases to over 40GB. It gets killed.\n\n3) Lastz complains: \"FAILURE: bad fasta character in hs37m.fa ...\".\n\n4) Pash only gives 'unique mapping'. Its unique mapping is better than BWA-SW's\n   Q1 mapiping. It is very slow, though, possibly because of suboptimal\n   options.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flh3%2Fwgsim","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flh3%2Fwgsim","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flh3%2Fwgsim/lists"}