{"id":22668286,"url":"https://github.com/lanl/slur-m-py","last_synced_at":"2025-03-29T10:40:37.335Z","repository":{"id":205832752,"uuid":"670368312","full_name":"lanl/SLUR-M-py","owner":"lanl","description":"SLUR(M)-py (pronounced slurpy): A SLURM Powered Pythonic Pipeline for Parallel Processing of 3D (Epi)genomic Profiles","archived":false,"fork":false,"pushed_at":"2024-07-21T22:42:41.000Z","size":208,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-02-04T09:17:45.577Z","etag":null,"topics":["atac-seq","chip-seq","hi-c","slurm"],"latest_commit_sha":null,"homepage":"https://www.biorxiv.org/content/10.1101/2024.05.18.594827v1","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lanl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"license.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-24T22:42:24.000Z","updated_at":"2024-07-21T22:42:43.000Z","dependencies_parsed_at":null,"dependency_job_id":"b06ddddf-21d4-42e7-b495-467278c279c2","html_url":"https://github.com/lanl/SLUR-M-py","commit_stats":null,"previous_names":["lanl/slur-m-py"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lanl%2FSLUR-M-py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lanl%2FSLUR-M-py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lanl%2FSLUR-M-py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lanl%2FSLUR-M-py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lanl","download_url":"https://codeload.github.com/lanl/SLUR-M-py/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246174469,"owners_count":20735409,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["atac-seq","chip-seq","hi-c","slurm"],"created_at":"2024-12-09T15:14:39.131Z","updated_at":"2025-03-29T10:40:37.277Z","avatar_url":"https://github.com/lanl.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SLURPY\n[SLUR(M)-py: A SLURM Powered Pythonic Pipeline for Parallel Processing of 3D (Epi)genomic Profiles](https://www.biorxiv.org/content/10.1101/2024.05.18.594827v2)\n## Setting up the computing environment\nSlurpy was developed using anaconda (python v 3.10.13). \nWe recommend using conda to manage the python environment needed by slurpy.\nBelow are commands needed to set up the \"bioenv\" for running slurpy. \n\n```\n## Make a python environment named bioenv \nconda create -n bioenv \n\n## Activate the environment\nconda activate bioenv \n```\n\nAfter making a new conda environment install needed packages.\n\n```\n## Use conda to install needed mods\nconda install numpy pandas matplotlib seaborn dask \n\n## Bring in mods from bioconda\nconda install -c bioconda biopython pysam samtools bwa samblaster macs2\n```\n\nIf the above installation command hangs, we recommend removing macs2 from the list of libraries and trying again. Then installing macs2 via pip.\n\n```\npip install macs2\n```\nA full list of the python libraries and their versions used to develop slurpy are listed within [python.dependencies.txt](https://github.com/SLUR-m-Py/SLURPY/blob/main/python.dependencies.txt).\n\n## Installation\nDownloading the repository as a .zip archive is easiest. For developers a simple clone command with git works too:\n\n```\ngit clone https://github.com/SLUR-m-Py/SLURPY.git\n```\n\nOnce slurpy is downloaded (and expanded), change the current directory to the local SLURPY directory and modify the python files as executables. \n\n```\ncd ./SLURPY\nchmod +x *.py \n```\n\n### Checking the python environment \nOnce within the slrupy directory, run the environment checking script, [modcheck.py](https://github.com/SLUR-m-Py/SLURPY/blob/main/modcheck.py).\n\n```\n## Activate the computing environment\nconda activate bioenv \npython modcheck.py\n```\n\nIf the environment was created successfully the script will run to completion and print the following:\n\n```\nINFO: Modules loaded correctly.\nINFO: We are ready to run slurpy!\n```\n\n## Running slurpy\n### Setting up a project directory\nCurrently the ATAC- and ChIP-seq protocols are fully functional. To run slurpy, change the current directory to the target project (in the example below, the project is named 2501_001) and soft link to the path of the slurpy executables.\n\n```\ncd /path/to/project/directory/2501_001\nln -s /path/to/SLURPY\n```\n\nWithin the project directory ensure the paired fastqs are listed or linked within a subdirectory \n“./fastqs”. \n\n```\nls -l ./fastqs/*.fastq.gz\n```\n### Envoking slurpy for Hi-C processing\nThe help menu (-h) of protocols within slurpy lists all the available arguments and default settings. Be sure to activate the conda environment \"bioenv\". \n```\n## Activate our computing environment\nconda activate bioenv \n\n## Call the help menu for hic.py \n./SLURPY/hic.py -h\nusage: hic.py [-h] -r ./path/to/reference.bwaix [-F 64] [-B 64] [-P tb] [-M chrM] [-Q 30] [-R step] [-q .fastq.gz] [-f 8] [-b 4] [-t 4] [-n name] [-E bp] [-C bp] [-L MboI] [-D n] [-Z n] [-G ./path/to/list.tsv] [-J ./path/to/juicer.jar]\n              [-S 25000, 10000, ... [25000, 10000, ... ...]] [--restart] [--debug] [--skip-dedup] [--clean] [--merge]\n\nProcessing and analysis pipeline for paired-end sequencing data from Hi-C experiments.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -r ./path/to/reference.bwaix, --refix ./path/to/reference.bwaix\n                        Path to input reference bwa index used in analysis.\n  -F 64, --fastp-splits 64\n                        The number of splits to make for each pair of input fastq files (default: 64). Controls the total number of splits across the run.\n  -B 64, --parallel-bwa 64\n                        Number of parallel bwa alignments to run (default: 64). Controls the number of bwa jobs submitted at once to slurm.\n  -P tb, --partition tb\n                        The type of partition jobs formatted by slurpy run on (default: tb).\n  -M chrM, --mtDNA chrM\n                        Name of the mitochondrial contig (default: chrM).\n  -Q 30, --map-threshold 30\n                        Mapping quality threshold to filter alignments (default: 30).\n  -R step, --rerun-from step\n                        Step within the pipeline to re-run from. Options for Hi-C analysis include: fastp, bwa, pre, post, filter, concat, split, sort, count, clean\n  -q .fastq.gz, --fastq .fastq.gz\n                        The file extension of input fastq files (default: .fastq.gz)\n  -f 8, --fastp-threads 8\n                        The number of threads used in fastp to split input fastq files (default: 8). Note: must be an even multiple of the number of splits.\n  -b 4, --bwa-threads 4\n                        The number of threads used per bwa alignment on split input fastq files (default: 4).\n  -t 4, --dask-threads 4\n                        The number of threads used in calls to functions and calculations with pandas and dask dataframe(s) (default: 4).\n  -n name, --run-name name\n                        Run name used to name output files. Default behavior is to use the current parent directory.\n  -E bp, --error-distance bp\n                        Linear genomic distance to parse left and right oriented, intra-chromosomal Hi-C pairs for missing restriciton site(s). Passing zero (0) will skip this check (default: 10000 bp).\n  -C bp, --self-circle bp\n                        Linear genomic distance to check outward facing, intra-chromosomal Hi-C contacts for self-circle artifacts. Passing zero (0) will skip this check (default: 30000 bp).\n  -L MboI, --library MboI\n                        The name of the restriction site enzyme (or library prep) used in Hi-C sample creation. Options include Arima, MboI, DpnII, Sau3AI, and HindIII (default: Arima). Passing none (i.e. Dovetail) is also allowed, but checks for\n                        restriction sites and dangling ends will be skipped.\n  -D n, --mindist n     A filter on the minimum allowed distance (in bp) between reads (within a pair) that make up an intra-chromosomal Hi-C contact. Default behaviour is none (i.e. default: 0).\n  -Z n, --chunksize n   Number of rows (default: 50000) loaded into pandas at a time. WARNING: while increasing could speed up pipeline it could also cause memeory issues.\n  -G ./path/to/list.tsv, --genomelist ./path/to/list.tsv\n                        Path to list of chromosomes (by name) to include in final Hi-C analysis. Must be a tab seperated tsv or bed, comma seperated csv, or space seperated txt file with no header.\n  -J ./path/to/juicer.jar, --jar-path ./path/to/juicer.jar\n                        Path to juicer jar file for juicer pre command. Required for .hic file creation.\n  -S 25000, 10000, ... [25000, 10000, ... ...], --bin-sizes 25000, 10000, ... [25000, 10000, ... ...]\n                        Chromosome resolution (i.e. bin sizes) for .hic files. Default: 2500000, 1000000, 500000, 250000, 100000, 50000, 25000, 10000\n  --restart             Flag to force the pipeline to reset and run from start.\n  --debug               A flag to run in verbose mode, printing sbatch commands. Default behavior is false.\n  --skip-dedup          Pass this flag to skip marking and removing duplicates. Default behavior is false (conduct duplicate marking).\n  --clean               If included will run clean up script at end of run. The default behavior is false, can be run after pipeline.\n  --merge               Passing this flag will merge across all pairs of fastqs for final output.\n```\n### For ATAC-seq experiments\nTo call the peaks.py script within the slurpy pipeline to analyze an ATAC-seq experiment run:\n\n```\n./SLURPY/peaks.py -r /path/to/reference/file.fasta\n```\n\n### For ChIP-seq experiments \nTo run slurpy to analyze a ChIP-seq experiment run:\n\n```\n./SLURPY/peaks.py -r /path/to/reference/file.fasta -c /path/to/control/or/input.bam\n```\n\n## Dependencies\nSlurpy utilizes SLURM and was developed under version 21.08.8-2. The suit of tools in samtools is also required with the minimum version of 1.15.1. \n\n### Additional linux core commands:\n* cat \n* rm\n* echo\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flanl%2Fslur-m-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flanl%2Fslur-m-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flanl%2Fslur-m-py/lists"}