https://github.com/multimeric/1000g-megaqc
1000 genomes data processed for use as a MegaQC test set
https://github.com/multimeric/1000g-megaqc
Last synced: about 2 months ago
JSON representation
1000 genomes data processed for use as a MegaQC test set
- Host: GitHub
- URL: https://github.com/multimeric/1000g-megaqc
- Owner: multimeric
- License: gpl-3.0
- Created: 2020-12-03T08:35:28.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2020-12-03T09:28:06.000Z (over 5 years ago)
- Last Synced: 2025-03-21T05:23:29.574Z (about 1 year ago)
- Size: 1.07 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 1000g-megaqc
1000 genomes data processed for use as a MegaQC test set.
## Provenance
### Processing
The data was processed as follows:
* We chose samples from 1000 Genomes Phase 3 that had `*.mapped.ILLUMINA.bwa.*.low_coverage.*.bam` and `*.wgs.COMPLETE_GENOMICS.*.snps_indels_svs_meis.high_coverage.genotypes.vcf.gz` files associated with them
* We then ran these commands over the BAM:
* `fastqc`
* `samtools stats --remove-dups --required-flag 1`
* `mosdepth --no-per-base --fast-mode --by exome.targets.bed`
* On the VCF we ran:
* `bcftools stats`
* Finally, the data was compiled by running `multiqc stats --module samtools --module bcftools --module fastqc --module mosdepth`
### Tool Versions
* bcftools=1.10.2
* fastqc=0.11.9
* samtools=1.7
* mosdepth=0.2.7