https://github.com/lu-vedder/maizesnp
SNP analysis of different maize (Zea mays L.) inbred lines in comparison to B73 as a reference line
https://github.com/lu-vedder/maizesnp
bioinformatics genetics maize snp-analysis
Last synced: 5 months ago
JSON representation
SNP analysis of different maize (Zea mays L.) inbred lines in comparison to B73 as a reference line
- Host: GitHub
- URL: https://github.com/lu-vedder/maizesnp
- Owner: lu-vedder
- License: mit
- Created: 2024-01-26T10:03:37.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-20T13:41:10.000Z (over 2 years ago)
- Last Synced: 2025-09-05T02:57:35.541Z (10 months ago)
- Topics: bioinformatics, genetics, maize, snp-analysis
- Language: Python
- Homepage:
- Size: 38.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
- Citation: CITATION.cff
Awesome Lists containing this project
README
# MaizeSNP
The MaizeSNP wolkflow was specifically designed for the SNP analysis of different maize (_Zea mays_ L.) inbred lines in comparison to B73 as a reference line.
The final result is a TSV-file, containing a set of SNPs between the respective inbred line and B73, located in the protein-coding regions of maize. Further, this file includes the allele counts, obtained from the original mapping.
**Workflow.txt**
The workflow file gives the exact order and parameter settings for runnig the other scripts. Please mind the following:
* Beforehand a mapping of all merged reads of one indred line ('LINE') has been performed using Bowtie2 (performed with version v2.2.9) [1]. The resulting mapping files (SAM-files) were prepared for the SNP calling as stated in the workflow using Samtools (v1.3.1) [2] and Picard (v2.9.0) [3].
* The SNP calling was performed using GATK (performed with version v3.7-0-gcfedb67) [4]. The input file is the before mentioned prepared mapping result.
* The reference genome of B73 was used in version 3. The usage of new versions may need some adaptations.
**filter_blacklist_merged_SNPs.py**
Using the SNPs between 'our' B73 and the reference genome as a blacklist for the filtering of "true" SNPs.
This step should reduce the mapping bias caused by variations between the reference B73 and 'our' B73 used in the lab experiments.
**empty_blacklist.txt**
This is just a dummy file. It can be used as an empty blacklist file in the 'filter_blacklist_merged_SNPs.py' script.
**count_SNP_alleles_mergedBAM.py - Python2 environment required!**
Count the number of reads matching the Ref/SNP allele using the merged BAM files.
**collect_gene_ids_from_gtf.py**
Collecting the 39,469 gene IDs of the protein coding genes of Zea mays, reference version 3 (GTF-format).
**collect_gene_ids_from_gff3.py**
Collecting the gene IDs of the protein coding genes of Zea mays based on reference version 4 (GFF3-format).
**filter_merged_snps_for_coding_genes.py**
Filter the SNPs (TSV-format) for positions located in the protein-coding genes set. Allele counts may be included in the file.
# Citation
Please cite via Zenodo: [](https://doi.org/10.5281/zenodo.10684044)
Or as: Vedder, L. (2024). MaizeSNP (Version 1.0.0) [Computer software]. https://github.com/lu-vedder/MaizeSNP (for details see the [CITATION](CITATION.cff) file).
# License
Copyright (c) 2024 Lucia Vedder
For details see the [LICENSE](LICENSE) file.
---
[1] https://bowtie-bio.sourceforge.net/bowtie2/index.shtml
[2] http://www.htslib.org
[3] https://broadinstitute.github.io/picard
[4] https://gatk.broadinstitute.org