{"id":22663934,"url":"https://github.com/zsteve/nf-atac","last_synced_at":"2026-02-16T17:35:42.740Z","repository":{"id":70047169,"uuid":"118833396","full_name":"zsteve/nf-ATAC","owner":"zsteve","description":" ATAC-seq pipeline written in Nextflow","archived":false,"fork":false,"pushed_at":"2018-10-08T05:45:11.000Z","size":42,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-09-07T02:02:09.894Z","etag":null,"topics":["atac-seq","bioinformatics","epigenomics","nextflow","ngs","pipeline"],"latest_commit_sha":null,"homepage":"","language":"Nextflow","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zsteve.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-01-24T23:05:25.000Z","updated_at":"2018-10-08T05:45:12.000Z","dependencies_parsed_at":"2023-04-27T23:01:03.590Z","dependency_job_id":null,"html_url":"https://github.com/zsteve/nf-ATAC","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zsteve/nf-ATAC","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zsteve%2Fnf-ATAC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zsteve%2Fnf-ATAC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zsteve%2Fnf-ATAC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zsteve%2Fnf-ATAC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zsteve","download_url":"https://codeload.github.com/zsteve/nf-ATAC/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zsteve%2Fnf-ATAC/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278322146,"owners_count":25967874,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-04T02:00:05.491Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["atac-seq","bioinformatics","epigenomics","nextflow","ngs","pipeline"],"created_at":"2024-12-09T12:45:43.854Z","updated_at":"2025-10-04T13:57:10.681Z","avatar_url":"https://github.com/zsteve.png","language":"Nextflow","funding_links":[],"categories":[],"sub_categories":[],"readme":"**nf-ATAC**\n\n\n_An integrated pipeline for ATAC-seq data written in *Nextflow* with:heart:_\n\n\nAuthor:\t\tStephen Zhang (stephen.zhang@monash.edu)\nDate:\t\t5 Feb 2018\n\n\n\n*Introduction*\n`nf-ATAC` pipeline for processing ATAC-seq data written in Nextflow script (https://www.nextflow.io/).\nCurrently in early stages, this `README` will definitely be updated regularly (check often!)\n\nHave an problem? Please log an issue on GitHub (https://github.com/zsteve/atac-seq-pipeline)\n\n*Dependencies*\nPlease make sure these tools are installed before running the pipeline:\n\n* [`MACS2`](https://github.com/taoliu/MACS)\n* [`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)\n* [`cutadapt`](http://cutadapt.readthedocs.io/en/stable/guide.html)\n* [`bowtie2`](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)\n* [`picard/2.8.2`](https://broadinstitute.github.io/picard/)\n* [`samtools`](http://samtools.sourceforge.net/)\n* [`homer`](http://homer.ucsd.edu/homer/) please add to $PATH\n* [`jvarkit`](https://github.com/lindenb/jvarkit) *only need to use samjs*\n* [`snakeyaml`](https://bitbucket.org/asomov/snakeyaml/wiki/Documentation) please add to $CLASSPATH\n* [`sambamba`](http://lomereiter.github.io/sambamba/) please add to $PATH\n\nFor QC, we require the following:\n\n* [`ATACseqQC`](https://bioconductor.org/packages/release/bioc/html/ATACseqQC.html)\n* [`Biostrings`](https://bioconductor.org/packages/release/bioc/html/Biostrings.html)\n* [`GenomicFeatures`](https://bioconductor.org/packages/release/bioc/html/GenomicFeatures.html)\n* [`GenomeInfoDb`](https://bioconductor.org/packages/release/bioc/html/GenomeInfoDb.html)\n* [`ChIPpeakAnno`](https://bioconductor.org/packages/release/bioc/html/ChIPpeakAnno.html)\n* [`MotifDb`](http://bioconductor.org/packages/release/bioc/html/MotifDb.html)\n\nOne can check that most dependencies are installed by running `checkdep.sh`.\n*At the current time, please manually confirm that `snakeyaml` is installed!\n\n*Installing Nextflow*\n\nNextflow can be downloaded by using the following command:\n\n`curl -s https://get.nextflow.io | bash`\n\nThis will create a binary `nextflow` in the working directory. You can add this binary to your `PATH` for ease of use:\n\n`export PATH=$PATH:[your path here]`\n\nThe pipeline can be executed by running `nextflow`, specifying the script and relevant commandline arguments.\n\n`nextflow \u003cscript\u003e.nf \u003ccommand line arguments\u003e`\n\n*Running the pipeline - single sample*\n\n_Data preparation_\n\nPaired-end read sample data in `.fastq.gz` format should be located in a directory with the desired sample name. Read pairs should be distinguishable in the format `*_R{1,2}*.fastq.gz`.\n\n_Configuration_\n\nThere are a few parameters which *must* be specified correctly in `config.yaml` before running the pipeline ... **things will not work without these parameters**\n\n * `macs2 : --gsize` must be specified for `macs2` to correctly call peaks.\n * `qc_report : bsgenome, txdb` must be specified for QC report generation using `ATACseqQC` to work. `bsgenome` must specifiy the BSgenome Biostrings package corresponding to the reference genome. `txdb` must specify the `GenomeFeatures` package containing transcript annotations for the reference genome. \n\n_Command_\n\nNextflow will create a `work` directory (containing pipeline data) in its working directory (i.e. `.`). Final pipeline output files will be output to a desired directory, however these will generally be _symlinks_ to the actual copy of the file within `work/**/your_file_here`. It is *very* important that `work` does *not* get deleted - otherwise your symlinks will mean nothing!\n\n```\n\nnextflow atac_pipeline.nf --num-cpus $NUM_CPUS\n\t\t\t  --jvarkit-path $JVARKIT_PATH\n\t\t\t  --input-dir $INPUT_DIR\n\t\t\t  --output-dir $OUTPUT_DIR\n\t\t\t  --config-file $CONFIG_FILE\n\t\t\t  --ref-genome-name $GENOME_NAME\n\t\t\t  --ref-genome-index $GENOME_INDEX\n\t\t\t  --ref-genome-fasta $GENOME_FASTA\n```\n\n* `NUM_CPUS` - maximum number of CPUs to use for the _entire_ pipeline\n* `INPUT_DIR` - path of the directory containing R1,R2 data\n* `OUTPUT_DIR` - path of the directory to write outputs to (will be created if it doesn't already exist). This can be the same as INPUT_DIR.\n* `CONFIG_FILE` (OPTIONAL) - path to `config.yaml` (in case one wants custom parameters for pipeline components). \n* `GENOME_NAME` - name of the reference genome (e.g. `danRer10`, `hg18`)\n* `GENOME_INDEX` - path to `bowtie2` indexes for reference genome\n* `GENOME_FASTA` - path to `FASTA` sequence of reference genome\n* `JVARKIT_PATH` - path to installation of `jvarkit`. \n\nNextflow will output its data to your directory of choice.\n\n*Running the pipeline - multiple samples*\n\n_Data preparation_\n\nFor each sample, create a folder `SAMPLE_ID/` containing the paired-end read data in `fastq.gz` format. Create a *sample table* as a text file:\n\n* Each line corresponds to *one* sample. Fields are as follows:\n\n```\n[Sample_ID] [path to sample input directory] [path to sample output directory]\n```\n\n_Command_\n\nPipeline will read in samples from the sample table `.txt` file and attempt to process those samples in _parallel_. \n\n```\nnextflow atac_pipeline.nf --num-cpus $NUM_CPUS\n\t\t\t  --jvarkit-path $JVARKIT_PATH\n\t\t\t  --config-file $CONFIG_FILE\n\t\t\t  --multi-sample\n\t\t\t  --sample-table $SAMPLE_TABLE\n\t\t\t  --ref-genome-name $GENOME_NAME\n\t\t\t  --ref-genome-index $GENOME_INDEX\n\t\t\t  --ref-genome-fasta $GENOME_FASTA\n\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzsteve%2Fnf-atac","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzsteve%2Fnf-atac","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzsteve%2Fnf-atac/lists"}