An open API service indexing awesome lists of open source software.

https://github.com/jubayer98/fix-genotype

The repository processes a VCF file to filter and transform its data based on specific genomic criteria.
https://github.com/jubayer98/fix-genotype

genotype vcf-files

Last synced: 4 months ago
JSON representation

The repository processes a VCF file to filter and transform its data based on specific genomic criteria.

Awesome Lists containing this project

README

        

## Fix Genotype
The provided repository processes a VCF (Variant Call Format) file to filter and transform its data based on specific genomic criteria. It begins by extracting the header lines and then reads the main body of the VCF file into a DataFrame. The script then performs several operations:

1. **Data Filtering:** Removes rows with missing or irrelevant data in a specified column.
2. **Value Extraction and Transformation:** Separates and processes genotype and coverage information, splits and recombines these values, and performs custom calculations to generate new columns.
3. **Data Cleanup:** Removes duplicate values, adjusts the ALT column based on new calculations, and finalizes the genotype column with updated values.
4. **Data Output:** Prepares the modified data for output, creates a new DataFrame with transformed data, and writes the updated content back to a new VCF file, preserving the original header.

The code ensures that the processed VCF data adheres to specified genomic criteria and formats, making it suitable for further analysis.