https://github.com/jcaperella29/bwa_samtools_cwl_flow
a common language workflow , workflow to process Fastq files
https://github.com/jcaperella29/bwa_samtools_cwl_flow
bioinformatics cwl-workflow next-generation-sequencing
Last synced: 3 months ago
JSON representation
a common language workflow , workflow to process Fastq files
- Host: GitHub
- URL: https://github.com/jcaperella29/bwa_samtools_cwl_flow
- Owner: jcaperella29
- Created: 2024-11-11T16:13:32.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-18T01:49:43.000Z (over 1 year ago)
- Last Synced: 2025-03-02T01:37:31.011Z (over 1 year ago)
- Topics: bioinformatics, cwl-workflow, next-generation-sequencing
- Language: Common Workflow Language
- Homepage:
- Size: 9.77 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
This repository contains a Common Workflow Language (CWL) workflow that:
Aligns paired-end reads to a reference genome using BWA.
Sorts the alignment output with Samtools.
Converts the sorted BAM file to BED format using Bedtools.
This setup is ideal for bioinformatics pipelines requiring reproducibility, portability, and flexibility across sequencing data types.
📋 Project Structure
plaintext
Copy code
project-root/
├── bwa_samtools_bedtools_workflow.cwl # Main CWL workflow for BWA, Samtools, and Bedtools steps
├── bwa_mem_tool.cwl # BWA alignment tool definition
├── samtools_sort_tool.cwl # Samtools sorting tool definition
├── bedtools_bamtobed_tool.cwl # Bedtools BAM to BED conversion tool definition
├── example-data/ # (Optional) Example FASTA/FASTQ files for testing
└── docs/ # Additional documentation or notes
🚀 Getting Started
Prerequisites
CWLTool: Install cwltool for running CWL workflows.
bash
Copy code
pip install cwltool
BWA: Ensure BWA is installed and accessible. BWA Installation Guide
Samtools: Ensure Samtools is installed and accessible. Samtools Installation Guide
Bedtools: Ensure Bedtools is installed and accessible. Bedtools Installation Guide
Setup
Clone the repository:
bash
Copy code
git clone https://github.com/yourusername/your-repo-name.git
cd your-repo-name
Prepare Reference Genome:
Use BWA to index the reference genome:
bash
Copy code
bwa index path/to/chr22.fa
This will generate necessary index files (.bwt, .pac, .ann, .amb, .sa) in the same directory as chr22.fa.
Organize Input Data:
Ensure the reference genome and paired-end FASTQ files are in appropriate paths.
🧬 Running the Workflow
Use the following command to execute the workflow:
bash
Copy code
cwltool bwa_samtools_bedtools_workflow.cwl \
--ref_genome /full/path/to/chr22.fa \
--ref_genome_bwt /full/path/to/chr22.fa.bwt \
--ref_genome_pac /full/path/to/chr22.fa.pac \
--ref_genome_ann /full/path/to/chr22.fa.ann \
--ref_genome_amb /full/path/to/chr22.fa.amb \
--ref_genome_sa /full/path/to/chr22.fa.sa \
--read1 /full/path/to/sample_R1.fastq \
--read2 /full/path/to/sample_R2.fastq
Example Command
bash
Copy code
cwltool --no-container /path/to/bwa_samtools_bedtools_workflow.cwl \
--ref_genome /path/to/chr22.fa \
--ref_genome_bwt /path/to/chr22.fa.bwt \
--ref_genome_pac /path/to/chr22.fa.pac \
--ref_genome_ann /path/to/chr22.fa.ann \
--ref_genome_amb /path/to/chr22.fa.amb \
--ref_genome_sa /path/to/chr22.fa.sa \
--read1 /path/to/sample_R1.fastq \
--read2 /path/to/sample_R2.fastq
Output
The workflow produces:
A sorted BAM file (sorted_bam_output) created by Samtools, which is the aligned and sorted sequence data.
A BED file (bed_output) created by Bedtools, which converts the sorted BAM alignment to BED format.
📄 License
This project is licensed under the MIT License. See the LICENSE file for more details.
🛠️ Additional Notes
Ensure all index files are in the same directory as the reference genome.
The workflow is compatible with cloud platforms and HPC environments supporting CWL.