{"id":42084926,"url":"https://github.com/semenko/serpent-methylation-pipeline","last_synced_at":"2026-01-26T10:16:11.093Z","repository":{"id":65725256,"uuid":"587057238","full_name":"semenko/serpent-methylation-pipeline","owner":"semenko","description":"An efficient, documented, reproducible Snakemake methylation analysis pipeline for BS-seq and EM-seq samples, including cfDNA.","archived":false,"fork":false,"pushed_at":"2025-05-23T15:06:20.000Z","size":12628,"stargazers_count":8,"open_issues_count":1,"forks_count":4,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-23T16:13:01.019Z","etag":null,"topics":["bisulfite","bs-seq","bsseq","em-seq","emseq","epigenetics","methylation","pipeline","snakemake"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/semenko.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-01-09T21:25:25.000Z","updated_at":"2025-05-23T15:06:24.000Z","dependencies_parsed_at":"2024-06-12T16:32:37.637Z","dependency_job_id":"f48a3ca3-5218-4c93-8315-902bd4bcdf40","html_url":"https://github.com/semenko/serpent-methylation-pipeline","commit_stats":{"total_commits":32,"total_committers":2,"mean_commits":16.0,"dds":0.09375,"last_synced_commit":"32868b5ff0714f28c7219ef62f81236e5e515670"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/semenko/serpent-methylation-pipeline","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semenko%2Fserpent-methylation-pipeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semenko%2Fserpent-methylation-pipeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semenko%2Fserpent-methylation-pipeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semenko%2Fserpent-methylation-pipeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/semenko","download_url":"https://codeload.github.com/semenko/serpent-methylation-pipeline/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semenko%2Fserpent-methylation-pipeline/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28774301,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-26T09:42:00.929Z","status":"ssl_error","status_checked_at":"2026-01-26T09:42:00.591Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bisulfite","bs-seq","bsseq","em-seq","emseq","epigenetics","methylation","pipeline","snakemake"],"created_at":"2026-01-26T10:16:10.439Z","updated_at":"2026-01-26T10:16:11.087Z","avatar_url":"https://github.com/semenko.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Serpent Methylation Pipeline (for Snakemake)\n\n[![Snakemake](https://img.shields.io/badge/snakemake-≥8.0.0-brightgreen.svg)](https://snakemake.github.io)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Platform: Linux](https://img.shields.io/badge/platform-Linux-blue.svg)](https://www.linux.org/)\n[![Super-Linter](https://github.com/semenko/serpent-methylation-pipeline/actions/workflows/linter.yml/badge.svg)](https://github.com/marketplace/actions/super-linter)\n[![Documentation](https://github.com/semenko/serpent-methylation-pipeline/actions/workflows/docs.yml/badge.svg)](https://semenko.github.io/serpent-methylation-pipeline/)\n\n\u003cimg src=\"serpent-logo.png\" width=\"500px\" alt=\"Serpent Pipeline Logo\" /\u003e\n\nA standardized, reproducible pipeline to process WGBS bisulfite \u0026 EM-seq data. This goes from .fastq to methylation calls (via [bwameth](https://github.com/brentp/bwa-meth) with [bwa-mem2](https://github.com/bwa-mem/bwa-mem2) and [biscuit](https://github.com/huishenlab/biscuit)) and includes extensive QC and plotting, using a Snakemake pipeline.\n\n## 📖 Documentation\n\n**[View the complete documentation](https://semenko.github.io/serpent-methylation-pipeline/)**\n\nThe documentation includes:\n- Detailed installation instructions\n- Configuration guide\n- Usage examples\n- Pipeline technical details\n- Troubleshooting guide\n- API reference\n\n## Quick Start\n\nThis pipeline is designed to be straightforward:\n1. Clone this repository and open the directory:\n   ```\n   git clone https://github.com/semenko/serpent-methylation-pipeline.git\n   cd serpent-methylation-pipeline\n   ```\n2. Install Snakemake via [mamba](https://github.com/conda-forge/miniforge#mambaforge) (or conda)\n   ```\n   mamba install -c bioconda -c conda-forge snakemake snakemake-storage-plugin-http\n   ```\n3. (Optional) Create a separate conda environment for pipeline dependencies:\n   ```\n   mamba env create -n serpent_pipeline_env -f workflow/envs/env.yaml\n   conda activate serpent_pipeline_env\n   ```\n4. Test the pipeline:\n   ```\n   snakemake --cores 4 --use-conda --dry-run\n   ```\n\nFor detailed instructions, see the [Installation Guide](https://semenko.github.io/serpent-methylation-pipeline/installation.html).\n\n## Features\n\nAt a high level, this pipeline reproducibly:\n- Builds a reference genome (GRCh38 with hs38d1 decoy, U2AF1 and ENCODE DAC masking)\n- Trims \u0026 filters reads using [fastp](https://github.com/OpenGene/fastp)\n- Aligns using [bwameth](https://github.com/brentp/bwa-meth) with [bwa-mem2](https://github.com/bwa-mem/bwa-mem2) backend\n- Marks non-converted reads using [mark-nonconverted-reads](https://github.com/nebiolabs/mark-nonconverted-reads)\n- Calls methylation using [biscuit](https://github.com/huishenlab/biscuit) pileup\n- Generates standardized outputs \u0026 QC including:\n  - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)\n  - [fastp](https://github.com/OpenGene/fastp) statistics\n  - [Biscuit QC](https://huishenlab.github.io/biscuit/)\n  - [samtools stats](https://github.com/samtools/samtools)\n  - [MethylDackel mbias plots](https://github.com/dpryan79/MethylDackel)\n  - [Goleft indexcov plots](https://github.com/brentp/goleft)\n  - [wgbs_tools](https://github.com/nloyfer/wgbs_tools) pat/beta files\n  - Compressed bed files and epibeds\n- Runs [multiqc](https://multiqc.info) across entire projects\n\n## Support\n\n- **Documentation**: [https://semenko.github.io/serpent-methylation-pipeline/](https://semenko.github.io/serpent-methylation-pipeline/)\n- **Issues**: [GitHub Issues](https://github.com/semenko/serpent-methylation-pipeline/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/semenko/serpent-methylation-pipeline/discussions)\n\n## Contributing\n\nWe welcome contributions! Please see the [Contributing Guide](https://semenko.github.io/serpent-methylation-pipeline/contributing.html) in our documentation.\n    ├── goleft/                 # goleft coverage plots\n    ├── logs/                   # runlogs from each pipeline component\n    ├── methyldackel/           # mbias plots\n    ├── raw/\n    │   ├── ...fastq.gz         # Raw reads\n    |   ├── ...md5.txt          # Checksums and validation\n    ├── samtools/               # samtools statistics\n    SAMPLE_02/\n    ...\n    ...\n    multiqc/                    # A project-level multiqc stats across all data\n\nNote each project also has a `_subsampled` directory with identical structure, which is the result of the pipeline run on only 10M reads/sample.\n\n\n### Production Runs\n\n\n## Pipeline Details\n\nThis pipeline was designed for highly reproducible, explainable alignments and analysis of epigenetic sequencing data.\n\n### Reference Genome\n\nI chose **GRCh38**, with these specifics:\n- No patches\n- Includes the hs38d1 decoy\n- Includes Alt chromosomes\n- Applies the [U2AF1 masking file](https://genomeref.blogspot.com/2021/07/one-of-these-things-doest-belong.html)\n- Applies the [Encode DAC exclusion](https://www.encodeproject.org/annotations/ENCSR636HFF/)\n\nYou can see a good explanation of the rationale for some of these components at [this NCBI explainer](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GRCh38_major_release_seqs_for_alignment_pipelines/README_analysis_sets.txt).\n\n### Requirements\n\nAll software requirements are specified in [env.yaml](workflow/envs/env.yaml).\n\nMost are relatively common, but a few are semi-unique:\n- [biscuit](https://github.com/huishenlab/biscuit) (for alignment)\n- NEB's [mark-nonconverted-reads](https://github.com/nebiolabs/mark-nonconverted-reads) (to mark partially converted reads)\n\nbiscuit was chosen after a comparison with bwa-meth and bismark — its latest version was the most flexible with extremely well annotated .bams (some critical tags are missing from bwa-meth for identifying read level methylation, and would require patching MethylDackel to extract data).\n\nI briefly experimented with [wgbs_tools](https://github.com/nloyfer/wgbs_tools) (which defines nice .pat/.beta formats) but its licensing is too restrictive to use.\n\n### Trimming Approach\n\nI chose a relatively conservative approach to trimming -- which is needed due to end-repair bias, adaptase bias, and more. \n\nFor **EMseq**, I trim 10 bp everywhere, after personal QC and offline discussions with NEB. See [my notes here](https://github.com/FelixKrueger/Bismark/issues/509).\n\nFor **BSseq**, I trim 15 bp 5' R2, and 10 bp everywhere else due to adaptase bias.\n\nFor all reads, I set `--trim_poly_g` (due to [two color bias](https://sequencing.qcfail.com/articles/illumina-2-colour-chemistry-can-overcall-high-confidence-g-bases/)) and set a `--length_required` (minimum read length) of 10 bp.\n\n### No Quality Filtering\n\nNotably I do NOT do quality filtering here (I set `--disable_quality_filtering`), and save this for downstream analyses as desired.\n\nI experimented with more stringent quality filtering early on, and found it had little yield / performance benefit. \n\n\n## Background \u0026 Inspiration\n\nI strongly suggest reading work from Felix Krueger (author of Bismark) as background. In particular:\n- TrimGalore's [RRBS guide](https://github.com/FelixKrueger/TrimGalore/blob/master/Docs/RRBS_Guide.pdf)\n- The Babraham [WGBS/RRBS tutorials](https://www.bioinformatics.babraham.ac.uk/training.html#bsseq)\n\nFor similar pipelines and inspiration, see:\n- NEB's [EM-seq pipeline](https://github.com/nebiolabs/EM-seq/)\n- Felix Krueger's [Nextflow WGBS Pipeline](https://github.com/FelixKrueger/nextflow_pipelines/blob/master/nf_bisulfite_WGBS)\n- The Snakepipes [WGBS pipeline](https://snakepipes.readthedocs.io/en/latest/content/workflows/WGBS.html)\n\n\n## Pipeline Graph\n\nHere's a high-level overview of the Snakemake pipeline (generated via `snakemake --rulegraph | dot -Tpng \u003e rules.png`)\n\n![image](https://github.com/user-attachments/assets/10e69a66-c196-4c3c-a9c0-461ee14203e6)\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsemenko%2Fserpent-methylation-pipeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsemenko%2Fserpent-methylation-pipeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsemenko%2Fserpent-methylation-pipeline/lists"}