{"id":44836044,"url":"https://github.com/grenaud/gargammel","last_synced_at":"2026-02-17T01:36:10.718Z","repository":{"id":145084338,"uuid":"60350055","full_name":"grenaud/gargammel","owner":"grenaud","description":"gargammel is an ancient DNA simulator","archived":false,"fork":false,"pushed_at":"2024-07-22T10:58:47.000Z","size":2274,"stargazers_count":24,"open_issues_count":6,"forks_count":14,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-07-22T13:08:32.190Z","etag":null,"topics":["ancient-dna-fragments","ancient-dna-sequences","metagenomics","sequence-simulator","sequencing"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/grenaud.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-06-03T13:33:06.000Z","updated_at":"2024-07-22T10:58:52.000Z","dependencies_parsed_at":"2023-05-19T09:30:28.767Z","dependency_job_id":"d3345409-48fd-43c9-8812-13c0590008f3","html_url":"https://github.com/grenaud/gargammel","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/grenaud/gargammel","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grenaud%2Fgargammel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grenaud%2Fgargammel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grenaud%2Fgargammel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grenaud%2Fgargammel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/grenaud","download_url":"https://codeload.github.com/grenaud/gargammel/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grenaud%2Fgargammel/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29529513,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-17T00:57:22.232Z","status":"ssl_error","status_checked_at":"2026-02-17T00:54:25.811Z","response_time":115,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ancient-dna-fragments","ancient-dna-sequences","metagenomics","sequence-simulator","sequencing"],"created_at":"2026-02-17T01:36:09.141Z","updated_at":"2026-02-17T01:36:10.701Z","avatar_url":"https://github.com/grenaud.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n  gargammel: simulations of ancient DNA datasets\n=====================================================================================\n\n[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/gargammel/README.html)\n\n\ngargammel is a set of programs aimed at simulating ancient DNA fragments. For ancient hominin samples\nour program can also simulate various levels of present-day human contamination and microbial contamination.\n\nThe website for gargammel can be found here: https://grenaud.github.io/gargammel/\n\n\nQuestions/bug report/feature requests :\n-------------------------------------------------------------------------------------\n\nIf you have Github account, consider creating an issue, you will help others who might have the same problem.\n\n\tcontact: Gabriel Renaud   \n\temail:\t gabriel [dot] reno [ at sign ] gmail.com\n\nI accept pull request for novel features.\n\nDownloading:\n-------------------------------------------------------------------------------------\nDo a :\n\n    git clone --recursive  --depth 1 https://github.com/grenaud/gargammel.git\n\nor via (bio)conda\n\n```bash\nconda install -c bioconda gargammel\n```\n\n\u003e Installing with conda will only provide the main gargammel program, for the additional scripts in the repository, please run `git clone` as above, and create the conda environment described below.\n\n\nRequirements:\n-------------------------------------------------------------------------------------\n* git\n* C++ compiler supporting C++11\n* cmake, you can install on Ubuntu by typing: sudo apt install cmake\n* zlib\n* lib gsl, you can install on Ubuntu by typing: sudo apt-get install libgsl0-dev\n\nIf you plan on using ms2chromosomes.py to simulate chromosomes based on ms, you also need: \n * Hudson's ms (see: http://home.uchicago.edu/rhudson1/source/mksamples.html)\n * seq-gen, you can install on Ubuntu by typing:   sudo apt install seq-gen\n\nBoth should be installed in your path.\n\nAlternatively, you can use the supplied [conda](https://https://conda.io/) `environment.yml` file to download and set up all dependencies described in this README for you.\n\n\n    conda env create -f environment.yml\n\n\nInstallation:\n-------------------------------------------------------------------------------------\n\n\u003e If you are using the conda enviroment, you can skip this step and just load the environment with `conda activate gargammel`. All subsequent steps you can replace `gargammel.pl` with just `gargammel`.\n\nIn the main directory, simply type\n\n  make \n\nThis should install bamtools (C++ library to read/write BAM files) and ART (Illumina read simulator).\n\nOverview:\n-------------------------------------------------------------------------------------\n\nThe main driver script, gargammel.pl calls the following programs in order to \nsimulate the in vivo process by which ancient DNA fragments are retrieved:\n\n* fragSim: simulation of ancient DNA fragments being retrieved at random from the genome\n* deamSim: simulation of damage to the fragments selected by fragSim\n* adptSim: adding of adapters to create raw Illumina reads (without errors and quality scores)\n\nFinally, the simulated raw Illumina reads are sent to ART to add sequencing errors and corresponding quality scores.\n\nInput description:\n-------------------------------------------------------------------------------------\n\nThe basic input is a directory with 3 subfolders named:\n * endo/\n * cont/\n * bact/\n\nWhich represent the endogenous ancient human, the present-day human contaminant and the microbial contamination respectively. Each file inside represents a genome (not simply a chromosome or scaffold). The endogenous ancient human can only contain more than 2 genomes since it is a diploid individual. For the microbial contamination, please add a representative set of microbes for your sample (see the section about the examples of microbial databases).\n\n\n\nExample of usage:\n-------------------------------------------------------------------------------------\n\nThis is an example of usage to simulate a slightly contaminated (8%) dataset. First, we will simulate chromosomes using ms and seq-gen:\n\n    mkdir data\n  \nNext, we will create 1000 simulations of 2 lineages that are allowed to coalesce after 0.2 units of coalescence. The first one will represent our endogenous ancient human while the other, the present-day human contaminant. It will also generate an additional chromosome from the same population as the contaminant to be used as reference for alignment. We generate sequences for those using the following script:\n\n    cd data/\n    python ../ms2chromosomes.py  -s 0.2 -f . -n 1000 \n    rm -rfv simul_* seedms #cleanup\n  \nThis will create the following files:\n\n    cont/cont.0.fa\n    cont/cont.1.fa\n    endo/endo.1.fa\n    endo/endo.2.fa\n    endo/segsites\n    ref.fa\n\nThe segsites files correspond to heterozygous sites between both endogenous genomes.\n\n\nThen we will create the aDNA fragments:\n\n    cd ..\n    ./gargammel.pl -c 3  --comp 0,0.08,0.92 -f src/sizefreq.size.gz  -matfile src/matrices/single-  -o data/simulation data/\n\nThis will simulate a dataset with 8% human contamination. The rate of misincorporation due to deamination that will be used will follow a single-strand deamination using the empirical rates measured from the Loschbour individual from:\n\n    Lazaridis, Iosif, et al. \"Ancient human genomes suggest three ancestral populations for present-day Europeans.\" Nature 513.7518 (2014): 409-413.\n\n\nThe size distribution of the aDNA fragments is a subset of:\n\n    Fu, Qiaomei, et al. \"Genome sequence of a 45,000-year-old modern human from western Siberia.\" Nature 514.7523 (2014): 445-449. \n\nThe read size will be 2x75bp and the Illumina platform being simulated is the HiSeq 2500. The final reads will be found:\n\n    data/out_s1.fq.gz\n    data/out_s2.fq.gz\n\n\nHere are further examples of usage:\n\n* Low coverage 0.5X coverage with fragments of length 40:\n\n`gargammel.pl -c 0.5  --comp 0,0,1 -l 40    -o data/simulation data/`\n\n* Generating exactly 1M fragments of length with a log-normal distribution of location 4.106487474 and scale 0.358874723:\n\n`gargammel.pl -n 1000000  --comp 0,0,1 --loc  4.106487474 --scale  0.358874723   -o data/simulation data/`\n\n* High coverage (20X) with high amount of present-day contamination (40%) with fragments of length 45:\n\n`gargammel.pl -c 20  --comp 0,0.4,0.6 -l 45 -o data/simulation data/`\n\n* Evaluating the impact of mapping 1M fragments with length 40 without double-stranded deamination:\n\n`gargammel.pl -n 1000000  --comp 0,0,1 -l 40    -o data/simulation data/`\n\n* Evaluating the impact of mapping 1M fragments    with length 40 with double-stranded deamination:\n\n`gargammel.pl -n 1000000  --comp 0,0,1 -l 40 -damage 0.03,0.4,0.01,0.3   -o data/simulation data/`\n\n* Generate a single-end run of 96 cycles on a HiSeq 2500 Illumina run with 1M fragments of 40bp:\n\n`gargammel.pl -n 1000000  --comp 0,0,1 -l 40 -rl 96  -se -ss HS25 -o data/simulation data/`\n\n* Generate a paired-end run of 96 cycles on a HiSeq 2500 Illumina run with 1M fragments of 40bp:\n\n`gargammel.pl -n 1000000  --comp 0,0,1 -l 40 -rl 96      -ss HS25 -o data/simulation data/`\n\n\n\n\n\nSpecifying damage/deamination:\n-------------------------------------------------------------------------------------\n\nIf you use gargammel.pl or deamSim, you can speficiy deamination/damage using either:\n\n1. Use Briggs model parametes (see Briggs, Adrian W., et al. \"Patterns of damage in genomic DNA sequences from a Neandertal.\" Proceedings of the National Academy of Sciences 104.37 (2007): 14616-14621.)\n\n2. Use a misincorporation matrix computed by mapDamage (https://ginolhac.github.io/mapDamage). This matrix is in the results directory created by mapDamage and is called \"misincorporation.txt\". There are 2 examples of such files:\n\n    examplesMapDamage/results_LaBrana/misincorporation.txt\n    examplesMapDamage/results_Ust_Ishim/misincorporation.txt\n\nThe first is from a  double-stranded library and the second a single-stranded one. To use either, you can use the wrapper script or deamSim as such:\n\n    -mapdamage examplesMapDamage/results_LaBrana/misincorporation.txt double\n    -mapdamage examplesMapDamage/results_Ust_Ishim/misincorporation.txt single\n\nWe suggest that you run mapDamage on the empirical data that you are trying to emulate and use the resulting misincorporation.txt file.\n\n3. Specify a matrix of deamination rates, we use the following format, the first line is the header:\n\n    \tA-\u003eC\tA-\u003eG\tA-\u003eT\tC-\u003eA\tC-\u003eG\tC-\u003eT\tG-\u003eA\tG-\u003eC\tG-\u003eT\tT-\u003eA\tT-\u003eC\tT-\u003eG\n    \tpos\trate_{A-\u003eC}\trate_{A-\u003eG}\trate_{A-\u003eT}\trate_{C-\u003eA}\trate_{C-\u003eG}\trate_{C-\u003eT}\trate_{G-\u003eA}\trate_{G-\u003eC}\trate_{G-\u003eT}\trate_{T-\u003eA}\trate_{T-\u003eC}\trate_{T-\u003eG}\n\nThe pos. is the position 0,1... after the fragment beginning/end. The rate is specified using the following: estimate  [estimate_low estimate_high]. For example, 0.3 [0.2 0.4] means that the rate of deamination is 0.3 or 30%.\n\nexample of a format:\n\n\tA-\u003eC\tA-\u003eG\tA-\u003eT\tC-\u003eA\tC-\u003eG\tC-\u003eT\tG-\u003eA\tG-\u003eC\tG-\u003eT\tT-\u003eA\tT-\u003eC\tT-\u003eG\n\t0\t1.853e-3 [1.726e-3..1.989e-3]\t4.064e-3 [3.875e-3..4.263e-3]\t3.269e-3 [3.099e-3..3.448e-3]\t6.661e-3 [6.254e-3..7.094e-3] 3.057e-3 [2.785e-3..3.355e-3] 8.004e-2 [7.865e-2..8.145e-2] 1.236e-2 [    1.183e-2..1.292e-2] 4.131e-3 [3.828e-3..4.459e-3] 6.703e-3 [6.314e-3..7.116e-3] 3.845e-3 [3.624e-3..4.079e-3] 4.581e-3 [4.339e-3..4.836e-3] 2.169e-3 [2.005e-3..2.347e-3]\n\t1\t1.986e-3 [1.849e-3..2.134e-3]\t4.273e-3 [4.070e-3..4.487e-3]\t3.030e-3 [2.859e-3..3.211e-3]\t5.357e-3 [5.001e-3..5.738e-3] 3.188e-3 [2.916e-3..3.485e-3] 1.427e-2 [1.369e-2..1.488e-2] 9.514e-3 [    9.075e-3..9.974e-3]\t3.316e-3 [3.061e-3..3.593e-3] 5.061e-3 [4.743e-3..5.400e-3] 3.421e-3 [3.216e-3..3.639e-3] 4.865e-3 [4.620e-3..5.124e-3]\t2.201e-3 [2.038e-3..2.377e-3]\n\nThis follows the output of https://bitbucket.org/ustenzel/damage-patterns.git\n\n4. You can use one of the precalculated rates of deamination in src/matrices/. There is a damage from single-strand and a double-strand libraries from the following study: \n\n    Lazaridis, Iosif, et al. \"Ancient human genomes suggest three ancestral populations for present-day Europeans.\" Nature 513.7518 (2014): 409-413.\n\nSee the methylation question for adding different rates of deamination for methylated/unmethylated cytosine.\n\n\nCan I specify different rates of misincorporation due to deamination for the endogenous/bacterial/human contaminant sources?\n-------------------------------------------------------------------------------------\n\nYes, please refer to the options of the wrapper script gargammel.pl\n\n\nIs it possible to specify different rates of deamination for methylated and unmethylated bases?\n-------------------------------------------------------------------------------------\n\nYes. In the endogenous genome, specify methylated cytosine as 'c' (lowercase c) and unmethylated cytosine as 'C' (uppercase C). You can specify multiple cells using the following directory structure:\n\n    input/\n    input/endo\n    input/endo/C0\n    input/endo/C0/chr20_0_split1.fa\n    input/endo/C0/chr20_0_split1.fa.fai\n    input/endo/C0/chr20_0_split2.fa\n    input/endo/C0/chr20_0_split2.fa.fai\n    input/endo/C1\n    input/endo/C1/chr20_1_split1.fa\n    input/endo/C1/chr20_1_split1.fa.fai\n    input/endo/C1/chr20_1_split2.fa\n    input/endo/C1/chr20_1_split2.fa.fai\n    input/endo/C2\n    input/endo/C2/chr20_2_split1.fa\n    input/endo/C2/chr20_2_split1.fa.fai\n    input/endo/C2/chr20_2_split2.fa\n    input/endo/C2/chr20_2_split2.fa.fai\n\nWhere C0 reprensents the first cell, C1 the second and so forth. A lower case C 'c' is a methylated C and an uppercase 'C' is a an unmethylated C. To create these files from a reference and a methylation map, please see the script src/addMethyl.pl which needs to be modified (hardcoded paths).\n\nMethylated and unmethylated cytosines on the - strand can be specified using 'g' and 'G'. Once this is done, you can specify the option:  --methyl for gargammel.pl.  When using --methyl, you can then specify different matrix files for rates of deamination for nonmethylated and methylated cytosines:\n\n    -matfilenonmeth    [matrix file prefix] Read the matrix file of substitutions for non-methylated Cs\n    -matfilemeth       [matrix file prefix] Read the matrix file of substitutions for methylated Cs\n\n\nHow can I get an ancient DNA composition profile for gargammel?\n-------------------------------------------------------------------------------------\n\nBy composition we mean the base frequency at the breaks. You could generate it manually, the format is as follows:\n\n    # comment\n    Chr\tEnd\tStd\tPos\tA\tC\tG\tT\tTotal\n    [chr]\t['5p' or '3p']\t['+' or '-']\t[pos wrt the 5p/3p end]\t[count A]\t[count C]\t[count G]\t[count T]\t[sum of counts]\n \nFor instance:\n\n\t# table produced by mapDamage version 2.0.5-1-ge06bd84\n\t# using mapped file Ust_Ishim.hg19_1000g.bam and human_g1k_v37.fasta as reference file\n\t# Chr: reference from sam/bam header, End: from which termini of DNA sequences, Std: strand of reads\n\tChr\tEnd\tStd\tPos\tA\tC\tG\tT\tTotal\n\t21\t3p\t+\t-4\t177086\t83624\t114115\t150943\t525768\n\t21\t3p\t+\t-3\t191241\t80099\t104155\t150269\t525764\n\t21\t3p\t+\t-2\t197747\t63995\t127660\t136360\t525762\n\t21\t3p\t+\t-1\t180637\t49770\t79519\t215833\t525759\n\t21\t3p\t+\t1\t188505\t79678\t204246\t53417\t525846\n\t21\t3p\t+\t2\t156848\t74009\t128222\t166767\t525846\n\t21\t3p\t+\t3\t188608\t75382\t113613\t148243\t525846\n\t21\t3p\t+\t4\t173245\t84205\t117226\t151170\t525846\n\nThe lines above specify the base count close +/- 4 bases to the 3p end for fragments mapping to the + strand. An example of this type of file is found here: src/dnacomp.txt\n\nSuch a file can be generated using mapDamage2.0: \n\n\tJonsson, Hakon, et al. \"mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters.\" Bioinformatics (2013): btt193.\n\nIt is normally called \"dnacomp.txt\" in the output directory, you can filter a single chromosome (in this case 21) using this command:\n\n\tgrep \"^21\\|^#\\|^Chr\"  /path to mapDamage output/results_[sample name]/dnacomp.txt \u003e  dnacomp.txt\n\n\nHow can I specify the size distribution?\n-------------------------------------------------------------------------------------\n\nAncient DNA molecules tend to be fragmented and can be very short but tend to have a specific shape. Both for the wrapper script (gargammel.pl) and the fragment simulation program (fragSim), there are are 4 ways to specify the :\n\n1) Specify a fixed length using -l \n------------\n\n2) Open a file containing the size distribution using -s, one empirical fragment length per line eg:\n------------\n~~~~\n82\n95\n66\n144\n87\n68\n74\n48\n77\n43\n~~~~\n------------\n3) Open a file containing the size frequencies using -f in the format \"size[TAB]freq\" eg:\n------------\n~~~~\n40\t0.017096\n41\t0.01832\n42\t0.0201954\n43\t0.018399\n44\t0.0195637\n45\t0.0198993\n46\t0.0196822\n47\t0.0209456\n48\t0.0203929\n49\t0.0199783\n50\t0.0204323\n~~~~\n\n------------\n4) Specify the size distribution using parameters from a log-normal distribution, using options --loc and --scale.\n------------\n\n\nHow can I get parameters for the size distribution?\n-------------------------------------------------------------------------------------\n\nIf you wish to specify the aDNA fragment size distribution as a log-normal, you can use the following script to infer the location and scale parameters:\n\n\t#!/usr/bin/env Rscript-3.2.0\n\tlibrary(fitdistrplus)\n\tlibrary(MASS)\n\t\t\n\targs=(commandArgs(TRUE))\n\t\n\tdata \u003c- read.table(args[1]);\n\t\t\n\tdf\u003c-fitdistr(data$V1, \"lognormal\")\n\t\n\tprint(df);\n\nYou can change the header to suit the version of R that you have.\n\nBacterial databases:\n-------------------------------------------------------------------------------------\n\nFor the input/bact/ directory which represent the microbial contamination, gargammel needs a set of fasta files that represent the different microbes. **Each file corresponds to exactly one microbial species.** Each fasta file must contain the genome of the microbial species, multiple scaffolds and plasmids are allowed. Each fasta file must also be faidx indexed. This directory must also contain a file called \"list\".  This file contains the list of every fasta files in that directory along with their relative abundance in the desired bacterial contamination. For example:\n\n    bacteria1.fa\t0.5\n    bacteria2.fa\t0.3\n    bacteria3.fa\t0.2\n\nThe abundance will be printed on the console when the program is launched. Some users have reported discrepancies between the original bacterial abundance and the printed one. Make sure that they are equal and that the bacterial abundance file uses UNIX carriage returns (use dos2unix or mac2unix to transform from DOS/MAC to Unix format).\n\nExamples of bacterial databases:\n-------------------------------------------------------------------------------------\n\nIf you wish to download an example of a suitable bacterial database, you can simply type:\n   \n     make bacterialex\n\nthis will create a directory called bactDBexample/ which contains clovis/ and k14/, the profiled microbial communities from Rasmussen et al. \"The genome of a Late Pleistocene human from a Clovis burial site in western Montana.\" Nature 506.7487 (2014): 225-229. and Seguin-Orlando et al. \"Genomic structure in Europeans dating back at least 36,200 years.\" Science 346.6213 (2014): 1113-1118, respectively.\n\nYou can copy the files from the fasta/ directory into the input's bact/ directory as such\n    \n    cp -v bacterialex/clovis/fasta/* [path to input]/bact/\n\nCreating bacterial databases from a metaBIT:\n-------------------------------------------------------------------------------------\n\nmetaBIT [https://bitbucket.org/Glouvel/metabit] is a metagenomic profiler from high-throughput sequencing shotgun data. To download the fasta files based on a profile obtained using metaBIT's output, simply supply the \"all_taxa.tsv\" file, which details the different species and their abundances, make sure you are connected to the internet and use the retrieveFromMetabit script in as such:\n\n    mkdir exampleBacteriaDB\n    cd exampleBacteriaDB\n    [copy the all_taxa.tsv in the current directory]\n    src/microbial_fetcher/retrieveFromMetabit all_taxa.tsv\n\nIf you wish, you can enter your email for the ftp from NCBI (to avoid getting banned from the FTP):\n\n   src/microbial_fetcher/retrieveFromMetabit all_taxa.tsv anonymous@server.net\n\n\nThis will download the necessary files from NCBI to create a database suitable for gargammel to simulate microbial species in the exampleBacteriaDB/fasta and run samtools faidx on each file. You need standard UNIX utilities such as awk/sed/python/curl/wget/gzip to be installed as well as samtools. Please move the fasta/ directory produced (exampleBacteriaDB/fasta in the example above) to the input/bact/. The file named \"exampleBacteriaDB/fastafasta/list\" is the list of bacterial species along with their abundance. Another file, \"exampleBacteriaDB/Microbial_ID.log\" details the strain/ID and ftp link used. retrieveFromMetabit uses GNU parallel (see O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.), please make sure that it is installed.\n\nIf you want to use a uniform probability instead of a weighted list, go to \"input/bact\" and type (if fasta files end with .fa):\n\n    total=`ls -1  input/bact/*fa |wc -l ` \u0026\u0026 ls -1 input/bact/*fa  | awk -v total=\"$total\" ' {print $1\"\\t\"(1/total)}' \u003e input/bact/list\n\n\nmetaBIT ref: Louvel et al. \"metaBIT, an integrative and automated metagenomic pipeline for analyzing microbial profiles from high-throughput sequencing shotgun data.\" Molecular ecology resources (2016).\n\n\n\n\nTutorial using empirical sequences for simulations:\n-------------------------------------------------------------------------------------\n\nTo provide an example of using empirical VCF files to create sequences for the simulation, there is a folder exampleSeq/ with a Makefile. This makefile provides a simple example of creating 2 chromosomes (2 endogenous sequences + 2 contaminant sequences for a diploid genome) from VCF files. This makefile needs the following commands to be installed in the path:\n\n* bedtools\n* bgzip\n* tabix\n* samtools\n* bcftools, must support \"consensus\" command\n\nMake sure that you are connected to the internet and type:\n\n    cd  exampleSeq/ \n    make\n\nThis will download the VCF files from the Altai Neanderthal (endogenous) and a present-day human of European descent (contaminant) create 4 files:\n         \n    inputfolder/endo/endo.2.fa\n    inputfolder/endo/endo.1.fa\n    inputfolder/cont/cont.1.fa\n    inputfolder/cont/cont.2.fa\n\nalong with their respective fasta index. If you wish to add bacterial sequences to the mix, please see the section above about \"Examples of bacterial databases\" and you can copy some files to cont/ directory: cp -v  ../bactDBexample/k14/fasta/* inputfolder/bact/ \n\nTo create a sample with say 10% present-day human contamination with fragment length of 40bp, run:\n      \n./gargammel.pl -c 0.5  --comp 0,0.1,0.9 -l 40    -o exampleSeq/simulationc10 exampleSeq/inputfolder/\n\nIf you have some microbial sequences, to create a sample with say 70% bacterial content, 5% present-day human contamination and 25% endogenous, run:\n      \n./gargammel.pl -c 0.5  --comp 0.7,0.05,0.25 -l 40    -o exampleSeq/simulationb70c5 exampleSeq/inputfolder/\n\nFAQ and issues \n-------------------------------------------------------------------------------------\n\n* I am getting:\n\n    ./art_illumina: error while loading shared libraries: libgsl.so.0: cannot open shared object file: No such file or directory\n\nMake sure you have libgsl installed and create a symbolic link:\n\n     sudo ln -s  /usr/lib/x86_64-linux-gnu/libgsl.so.23.0.0   /usr/lib/libgsl.so.0 \n\n* Can I know where on the genome of origin the fragment was sampled?\n\n    Yes! fragSim, which generates the original fragments uses the following format: [CHROMOSOME NAME]:[STRAND]:[START]:[END]:[LENGTH]\n    The overall wrapper (gargammel.pl) will add e1_ to endogenous fragments from the first reference and e2_ to endogenous fragments from the second reference. It will add c_X to the fragments from present-day human contaminants where X is the # of the genome and b_X to the fragments from bacterial contaminants where X is the # of the genome.\n\n* Can gargammel simulate indels?\n\n   Yes and no. gargammel does not currently insert indels as a result of sequencing errors. However, if you add indels in your input genome, it will handle them without any problems. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrenaud%2Fgargammel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgrenaud%2Fgargammel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrenaud%2Fgargammel/lists"}