https://github.com/jubayer98/fix-genotype
The repository processes a VCF file to filter and transform its data based on specific genomic criteria.
https://github.com/jubayer98/fix-genotype
genotype vcf-files
Last synced: 4 months ago
JSON representation
The repository processes a VCF file to filter and transform its data based on specific genomic criteria.
- Host: GitHub
- URL: https://github.com/jubayer98/fix-genotype
- Owner: jubayer98
- Created: 2024-08-08T12:33:32.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-08-08T12:36:32.000Z (10 months ago)
- Last Synced: 2025-01-11T17:50:43.891Z (5 months ago)
- Topics: genotype, vcf-files
- Language: Python
- Homepage:
- Size: 662 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Fix Genotype
The provided repository processes a VCF (Variant Call Format) file to filter and transform its data based on specific genomic criteria. It begins by extracting the header lines and then reads the main body of the VCF file into a DataFrame. The script then performs several operations:1. **Data Filtering:** Removes rows with missing or irrelevant data in a specified column.
2. **Value Extraction and Transformation:** Separates and processes genotype and coverage information, splits and recombines these values, and performs custom calculations to generate new columns.
3. **Data Cleanup:** Removes duplicate values, adjusts the ALT column based on new calculations, and finalizes the genotype column with updated values.
4. **Data Output:** Prepares the modified data for output, creates a new DataFrame with transformed data, and writes the updated content back to a new VCF file, preserving the original header.The code ensures that the processed VCF data adheres to specified genomic criteria and formats, making it suitable for further analysis.