{"id":38651771,"url":"https://github.com/sanger-tol/curationpretext","last_synced_at":"2026-04-02T12:01:31.461Z","repository":{"id":180136500,"uuid":"662196291","full_name":"sanger-tol/curationpretext","owner":"sanger-tol","description":"A Nextflow DSL2 pipeline for pretext generation in curation","archived":false,"fork":false,"pushed_at":"2026-03-30T14:46:17.000Z","size":7404,"stargazers_count":13,"open_issues_count":10,"forks_count":5,"subscribers_count":7,"default_branch":"main","last_synced_at":"2026-03-30T16:34:43.273Z","etag":null,"topics":["genomics","hic","nextflow","pipeline"],"latest_commit_sha":null,"homepage":"https://pipelines.tol.sanger.ac.uk/curationpretext","language":"Nextflow","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sanger-tol.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-07-04T15:03:01.000Z","updated_at":"2026-03-23T17:45:27.000Z","dependencies_parsed_at":"2023-10-02T19:19:03.776Z","dependency_job_id":"cc8643a0-d35d-4fe0-b190-0e9103ccd8ad","html_url":"https://github.com/sanger-tol/curationpretext","commit_stats":null,"previous_names":["dlbpointon/curationpretext"],"tags_count":16,"template":false,"template_full_name":null,"purl":"pkg:github/sanger-tol/curationpretext","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sanger-tol%2Fcurationpretext","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sanger-tol%2Fcurationpretext/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sanger-tol%2Fcurationpretext/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sanger-tol%2Fcurationpretext/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sanger-tol","download_url":"https://codeload.github.com/sanger-tol/curationpretext/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sanger-tol%2Fcurationpretext/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31305971,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T09:48:21.550Z","status":"ssl_error","status_checked_at":"2026-04-02T09:48:19.196Z","response_time":89,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["genomics","hic","nextflow","pipeline"],"created_at":"2026-01-17T09:22:09.109Z","updated_at":"2026-04-02T12:01:31.411Z","avatar_url":"https://github.com/sanger-tol.png","language":"Nextflow","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ![sanger-tol/curationpretext](docs/images/curationpretext-light.png#gh-light-mode-only) ![sanger-tol/curationpretext](docs/images/curationpretext-dark.png#gh-dark-mode-only)\n\n[![GitHub Actions CI Status](https://github.com/sanger-tol/curationpretext/actions/workflows/nf-test.yml/badge.svg)](https://github.com/sanger-tol/curationpretext/actions/workflows/nf-test.yml)\n[![GitHub Actions Linting Status](https://github.com/sanger-tol/curationpretext/actions/workflows/linting.yml/badge.svg)](https://github.com/sanger-tol/curationpretext/actions/workflows/linting.yml)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.12773958-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.12773958)\n[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)\n\n[![Nextflow](https://img.shields.io/badge/version-%E2%89%A525.04.0-green?style=flat\u0026logo=nextflow\u0026logoColor=white\u0026color=%230DC09D\u0026link=https%3A%2F%2Fnextflow.io)](https://www.nextflow.io/)\n[![nf-core template version](https://img.shields.io/badge/nf--core_template-3.5.2-green?style=flat\u0026logo=nfcore\u0026logoColor=white\u0026color=%2324B064\u0026link=https%3A%2F%2Fnf-co.re)](https://github.com/nf-core/tools/releases/tag/3.5.2)\n[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000\u0026logo=anaconda)](https://docs.conda.io/en/latest/)\n[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000\u0026logo=docker)](https://www.docker.com/)\n[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)\n[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/sanger-tol/curationpretext)\n\n## Introduction\n\n**sanger-tol/curationpretext** is a bioinformatics pipeline typically used in conjunction with [TreeVal](https://github.com/sanger-tol/treeval) to generate pretext maps (and optionally telomeric, gap, coverage, and repeat density plots which can be ingested into pretext) for the manual curation of high quality genomes.\n\nThis is intended as a supplementary pipeline for the [treeval](https://github.com/sanger-tol/treeval) project. This pipeline can be simply used to generate pretext maps, information on how to run this pipeline can be found in the [usage documentation](https://pipelines.tol.sanger.ac.uk/curationpretext/usage).\n\n![Workflow Diagram](./docs/images/CurationPretext-1.6.0.jpeg)\n\nThe above image shows the use of this pipeline inside of the manual curation process and follows the below major steps.\n\n1. CRAM_MAP_ILLUMINA_HIC (ALIGN_CRAM) + PAIRS_CREATE_CONTACT_MAPS (CREATE_MAPS) - Generates pretext maps as well as a static image.\n\n2. ACCESSORY_FILES - Generates the repeat density, gap, telomere, and coverage tracks.\n\n3. PRETEXT_INGEST - Imports the generated tracks into pretext for visualisation.\n\n## Usage\n\n\u003e [!NOTE]\n\u003e If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.\n\nCurrently, the pipeline uses the following flags:\n\n- `--input`\n  - The absolute path to the assembled genome in, e.g., `/path/to/assembly.fa`\n\n- `--sample`\n  - Sample is the naming prefix of the output files, e.g. iyTipFemo\n\n- `--reads`\n  - The directory of the fasta files generated from longread reads, e.g., `/path/to/fasta/`\n  - This folder _must_ contain files in a `.fasta.gz` format, or they will be skipped by the internal file search function.\n\n- `--read_type`\n  - The type of longread data you are utilising, e.g., ont, illumina, hifi.\n\n- `--aligner`\n  - The aligner you wish to use for the coverage generation, defaults to `AUTO` but options include `bwamem2` and `minimap2`.\n\n- `--cram`\n  - The directory of the cram _and_ cram.crai files, e.g., `/path/to/cram/`\n\n- `--map_order`\n  - hic map scaffold order, input either `length` or `unsorted`\n\n- `--teloseq`\n  - A telomeric sequence, e.g., `TTAGGG`\n\n- `--multi_mapping`\n  - Level of multi-mapping read filtering to perform whilst building the pretext map.\n\n- `--all_output`\n  - An option to output all maps + accessory files, the default will only output the pretextmaps where ingestion has occured.\n\n- `--skip_tracks`\n  - A csv list of accessory tracks to skip, options are: `ALL`, `gap`, `coverage`, `telo`, `repeats`, `NONE`. Default is `NONE`. Please note that capitalization matters.\n\n- `--split_telomere`\n  - A boolean to also generate the telomere track in 5Prime and 3Prime styles, this is also include the original telomere track.\n\n- `--pre_mapped_bam`\n  - A boolean option to use `--cram` as input for _A_ pre-mapped bam file.\n\n- `--cram_chunk_size`\n  - The number of records in a cram file should be chunked into, defaults to 10000.\n\n- `--run_hires`\n  - A boolean to run the pipeline in hires mode, i.e., generate hires resolution maps. Default is `true`\n\n- `--run_ultra`\n  - A string argument to run the pipeline in ultra resolution mode, i.e., generate ultra resolution maps. Options are: `yes`, `force`, `no`. Default is `yes`, this runs ultra resolution maps is the genome file is \u003e 4.Gb.\n\n- `--snapshot_order`\n  - A path to a `genome`, `sizes` or `fai` file containing the scaffolds in the order required for the output snapshot png file.\n\nNow, you can run the pipeline using:\n\n```bash\nnextflow run sanger-tol/curationpretext \\\n  --input { input.fasta } \\\n  --cram { path/to/cram/ } \\\n  --reads { path/to/longread/fasta/ } \\\n  --read_type { default is \"hifi\" }\n  --sample { default is \"pretext_rerun\" } \\\n  --teloseq { default is \"TTAGGG\" } \\\n  --map_order { default is \"unsorted\" } \\\n  --multi_mapping { default is \"0\" (for no filtering of multi-mapping reads)} \\\n  --all_output \u003ctrue/false\u003e \\\n  --outdir { OUTDIR } \\\n  -profile \u003cdocker/singularity/{institute}\u003e\n\n```\n\n\u003e **Warning:**\n\u003e Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those\n\u003e provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;\n\nFor more details, please refer to the [usage documentation](https://pipelines.tol.sanger.ac.uk/curationpretext/usage) and the [parameter documentation](https://pipelines.tol.sanger.ac.uk/curationpretext/parameters).\n\n## Pipeline output\n\nTo see the the results of a test run with a full size dataset refer to the [results](https://pipelines.tol.sanger.ac.uk/curationpretext/results) tab on the sanger-tol/curationpretext website pipeline page.\nFor more details about the output files and reports, please refer to the\n[output documentation](https://pipelines.tol.sanger.ac.uk/curationpretext/output).\n\n## Credits\n\nsanger-tol/curationpretext was originally written by Damon-Lee B Pointon (@DLBPointon).\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n- @muffato - For reviews.\n\n- @yumisims - TreeVal and Software.\n\n- @weaglesBio - TreeVal and Software.\n\n- @josieparis - Help with better docs and testing.\n\n- @mahesh-panchal - Large support with 1.2.0 in making the pipeline more robust with other HPC environments.\n\n- @GRIT - For feedback and feature requests.\n\n- @prototaxites - Support with 1.3.0 and showing me the power of GAWK.\n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).\n\n## Citations\n\nIf you use sanger-tol/curationpretext for your analysis, please cite it using the following doi: [10.5281/zenodo.12773958](https://doi.org/10.5281/zenodo.12773958)\n\nAn extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.\n\nThis pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/main/LICENSE).\n\n\u003e **The nf-core framework for community-curated bioinformatics pipelines.**\n\u003e\n\u003e Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso \u0026 Sven Nahnsen.\n\u003e\n\u003e _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsanger-tol%2Fcurationpretext","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsanger-tol%2Fcurationpretext","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsanger-tol%2Fcurationpretext/lists"}