{"id":46366730,"url":"https://github.com/mpieva/quicksand","last_synced_at":"2026-03-05T02:31:08.017Z","repository":{"id":39568498,"uuid":"380087566","full_name":"mpieva/quicksand","owner":"mpieva","description":"A pipeline for the analysis of sedimentary ancient mtDNA.","archived":false,"fork":false,"pushed_at":"2026-02-09T15:57:15.000Z","size":139433,"stargazers_count":14,"open_issues_count":4,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-02-09T19:50:43.141Z","etag":null,"topics":["ancient-dna","bioinformatics","docker","nextflow","pipeline","sedadna","singularity"],"latest_commit_sha":null,"homepage":"https://quicksand.readthedocs.io","language":"Nextflow","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mpieva.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-06-25T01:02:06.000Z","updated_at":"2026-02-09T15:57:19.000Z","dependencies_parsed_at":"2024-11-14T14:29:54.137Z","dependency_job_id":"243fcb96-1b61-4990-8b38-c899293d956d","html_url":"https://github.com/mpieva/quicksand","commit_stats":null,"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"purl":"pkg:github/mpieva/quicksand","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpieva%2Fquicksand","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpieva%2Fquicksand/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpieva%2Fquicksand/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpieva%2Fquicksand/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mpieva","download_url":"https://codeload.github.com/mpieva/quicksand/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpieva%2Fquicksand/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30107202,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-05T01:39:18.192Z","status":"online","status_checked_at":"2026-03-05T02:00:06.710Z","response_time":93,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ancient-dna","bioinformatics","docker","nextflow","pipeline","sedadna","singularity"],"created_at":"2026-03-05T02:31:06.468Z","updated_at":"2026-03-05T02:31:07.938Z","avatar_url":"https://github.com/mpieva.png","language":"Nextflow","funding_links":[],"categories":[],"sub_categories":[],"readme":"![MIT License](https://img.shields.io/github/license/mpieva/quicksand?style=for-the-badge)\n[![DOI](https://img.shields.io/badge/DOI-10.1093%2Fmolbev%2Fmsaf305-teal?style=for-the-badge)](https://doi.org/10.1093/molbev/msaf305)\n\n# quicksand\n\nSee [readthedocs](https://quicksand.readthedocs.io/en/latest/in_and_out.html) for the full documentation of the pipeline.\n\n## Description\n\nquicksand (**quick** analysis of **s**edimentary **an**cient **D**NA) is an open-source [Nextflow](https://doi.org/10.1038/nbt.3820) pipeline designed for rapid and accurate taxonomic classification of mammalian mitochondrial DNA (mtDNA) in aDNA samples. quicksand combines fast alignment-free classification using [KrakenUniq](https://doi.org/10.1186/s13059-018-1568-0) with downstream mapping ([BWA](https://github.com/mpieva/network-aware-bwa)), post-classification filtering, and ancient DNA authentication. quicksand is optimized for speed and portablity and requires either [Singularity](https://doi.org/10.1371/journal.pone.0177459) or [Docker](https://www.docker.com/).\n\n## Workflow\n\n\u003cp align=center\u003e\n    \u003cimg src=\"assets/docs/workflow_v2.3.jpg\" alt=\"Graphical representation of the pipeline workflow\" width='800px'\u003e\n\u003c/p\u003e\n\n## Quickstart\n\n### Requirements\n\nTo run Nextflow, you need a POSIX-compatible system (e.g., Linux or macOS). quicksand was developed and tested on Linux (x86_64 architecture)\n\nTo run quicksand, please install\n\n- [Nextflow](https://www.nextflow.io/docs/latest/getstarted.html) v22.10 or larger\n- [Singularity](https://sylabs.io/singularity/) or [Docker](https://www.docker.com/)\n\n**Note:** To run quicksand in singularity, your kernel needs to support user-namespaces (see [here](https://github.com/apptainer/singularity/issues/5240#issuecomment-618405898) or [here](https://github.com/apptainer/singularity/issues/6341)).\n\n\n### Prepare Input\n\nThe input for quicksand is a directory with user-supplied files in BAM or FASTQ format. Adapter-trimming, overlap-merging and sequence demultiplexing need to be performed by the user prior to running quicksand. Provide the directory with the `--split` flag\n\n\u003e [!CAUTION]\n\u003e Each input-file should correspond to a single sequence-library. The processing of merged libraries with quicksand can lead to sequence loss because of the PCR-deduplication step with bam-rmdup \n\n#### Download Test-file\n\nAs a test file, download a mammalian mtDNA capture library from Denisova Cave Layer 20 (published in Zavala et al. 2021)\n\n```bash\nwget -P split \\\nftp://ftp.sra.ebi.ac.uk/vol1/run/ERR564/ERR5640810/A20896.bam\n```\n\n### Create Reference Database\n\nThe required KrakenUniq database, the reference genomes for mapping and the bed-files for low-complexity filtering are available on the MPI EVA FTP Servers. Custom versions of the reference material can be created with the [quicksand-build pipeline](https://github.com/mpieva/quicksand-build)\n\n#### Create Test Database\n\nFor the quickstart of quicksand, create a small test-database containing only the Hominidae, Bovidae and Hyaenidea mtDNA reference genomes (~150 genomes, runtime: ~3-5 minutes, size ~5GB).\n\n(to reduce database size and runtime, use only `--include Hominidae`)\n\n```bash\nnextflow run mpieva/quicksand-build -r v3.1 \\\n  --include  Hominidae,Bovidae,Hyaenidae \\\n  --outdir   refseq \\\n  -profile   singularity\n```\n\n#### Download Full Database\n\n To download the full reference database (~60GB), use this command:\n\n```bash\nlatest=$(curl http://ftp.eva.mpg.de/quicksand/LATEST)\nwget -r -np -nc -nH --cut-dirs=3 --reject=\"*index.html*\" -q --show-progress -P refseq http://ftp.eva.mpg.de/quicksand/build/$latest\n```\nThis can take several hours! For testing quicksand its recommended to build a small database (see above)\n\n### Run quicksand\n\nquicksand is executed directly from github. With the test-database created and the test-file downloaded, run the pipeline as follows:\n\n```bash\n# set this if you encounter a heap-space error to increase the memory that is used by nextflow\nexport NXF_OPTS=\"-Xms10g -Xmx15g\" # increase or decrease the numbers as required\n\nnextflow run mpieva/quicksand -r v2.5 \\\n  --db        refseq/kraken/Mito_db_kmer22/ \\\n  --genomes   refseq/genomes/ \\\n  --bedfiles  refseq/masked/ \\\n  --split     split/ \\\n  -profile    singularity #mind the single dash!\n```\n\n### Output\n\nPlease see the [documentation](https://quicksand.readthedocs.io/en/latest/in_and_out.html) for a comprehensive description of the output files and structure!\n\nThe main summary table (`final_report.tsv`) contains one line per input file and detected family (passing the `--krakenuniq_min_kmers` and `--krakenuniq_min_reads` cutoff). The following columns are reported:\n\n- **RG:** Name of the file analyzed\n- **ReadsRaw:** Raw number of sequences in the file (paired reads only counted once)\n- **ReadsFiltered** Number of sequences after filtering for `--bamfilterflag`\n- **ReadsLengthfiltered:** Number of sequences after additonal filtering for sequence length (`--bamfilter_length_cutoff`)\n- **Kmers:** KrakenUniq: Number of unique kmers used for classification (format: \"best\" and \"(family)\") \n- **KmerCoverage:** KrakenUniq: Kmer coverage for that classification (format: \"best\" and \"(family)\")\n- **KmerDupRate:** KrakenUniq: Kmer duplication rate for that classification (format: \"best\" and \"(family)\")\n- **ExtractLVL:** \"f\" (family) or \"o\" (order), set by `--taxlvl`\n- **ReadsExtracted:** Number of sequences assigned by KrakenUniq\n- **Order:** Detected Order\n- **Family:** Detected Family\n- **Species:** The reference-genome used for the mapping of 'ReadsExtracted'\n- **Reference:** \"best\" or \"fixed\" (`--fixed`)\n- **ReadsMapped:** Number of sequences mapped, passing the mapping quality-cutoff(`--mapbwa_quality_cutoff`)\n- **ProportionMapped:** ReadsMapped / ReadsExtracted \n- **ReadsDeduped:** Number of unique sequences (removed PCR duplicates) \n- **DuplicationRate:** ReadsMapped / ReadsDeduped\n- **CoveredBP:** `covbases` stat of the `samtools coverage` command. The number of bases covered in the reference genome by mapped sequences (max: ~17000)\n- **ReadsBedfiltered:** Number of unique sequences after applying low-complexity bed-filtering\n- **PostBedCoveredBP:** Number of bases covered in the reference genome by bedfiltered sequences\n- **FamPercentage:** Percentage of unique sequences in the alignment from the total number of unique sequences in the sample (For PSF-filter)\n- **Ancientness:** \"-\", \"+\" or \"++\". Significance level of the C-to-T or G-to-A deamination rate in the alignment (for G-to-A, use `--doublestranded`)   \n- **ReadsDeam(1term):** Number of unique sequences with a C-to-T or G-to-A substitution in the _terminal_ base positions (for G-to-A, use `--doublestranded`)\n- **ReadsDeam(3term):** Number of unique sequences with a C-to-T or G-to-A substitution in the _terminal three_ positions (for G-to-A, use `--doublestranded`)\n- **Deam5(95ci):** The _terminal_ C-to-T substitution-rate (and the 95% confidence interval) for the 5' end\n- **Deam3(95ci):** The _terminal_ C-to-T or G-to-A substitution-rate (and the 95% confidence interval) for the 3' end (for G-to-A, use `--doublestranded`)\n- **Deam5Cond(95ci):** The _terminal_ C-to-T substitution-rate (and the 95% confidence interval) for the 5' end, conditioned on a substitution at the opposite end \n- **Deam3Cond(95ci):** The _terminal_ C-to-T (or G-to-A) substitution-rate (and the 95% confidence interval) for the 3' end, conditioned on a substitution at the opposite end\n- **MeanFragmentLength:** Mean fragment length of all unique sequences\n- **MeanFragmentLength(3term):** Mean fragment length of all 'ancient' sequences\n- **Coverage:** `meandepth` of the `samtools coverage` command. Corresponds to the depth of coverage in the alignment.\n- **Breadth:** `coverage` of the `samtools coverage` command / 100: The proportion of bases covered by mapped sequences in the reference genome\n- **ExpectedBreadth:** Expected breadth based on 'Coverage'. Calculated using the formula: ExpectedBreadth = 1 - e^(-0.833 × Coverage)\n- **ProportionExpectedBreadth:** Breadth / ExpectedBreadth (For PEB-filter)\n\n### Common Errors\nA collection of common nextflow-errors and how to solve them\n\n#### Heap Space\n```\n -- Check '.nextflow.log' file for details\nERROR ~ Java heap space\n```\nHeap space errors can occur if nextflow itself requires more memory than provided by default (e.g. when screening too many samples in parallel). You can increase the heap-space as needed (e.g., to 5gb) with \n```\nexport NXF_OPTS=\"-Xms5g -Xmx5g\"\n``` \n\n## Citation\n\nIf you use quicksand in your research, please cite the quicksand publication as follows:\n\n\u003e Szymanski, Merlin, Johann Visagie, Frederic Romagne, Matthias Meyer, and Janet Kelso. \\\n\u003e “**quick** analysis of **s**edimentary **an**cient **D**NA using _quicksand_”, Molecular Biology and Evolution, 2025. \\\n\u003e [https://doi.org/10.1093/molbev/msaf305](https://doi.org/10.1093/molbev/msaf305).\n\n## Honorable Mentions\n\nThis pipeline uses code inspired by the [nf-core](https://nf-co.re) initative, reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE).\n\n\u003e Ewels, Philip A., Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso, and Sven Nahnsen. 2020.\n\u003e “The Nf-core Framework for Community-curated Bioinformatics Pipelines”.\n\u003e Nature Biotechnology 38 (3): 276–78. [https://doi.org/10.1038/s41587-020-0439-x](https://doi.org/10.1038/s41587-020-0439-x).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmpieva%2Fquicksand","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmpieva%2Fquicksand","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmpieva%2Fquicksand/lists"}