https://github.com/vaishnavipaithane/genomic-data-and-management-analysis

This miniproject is part of 'Fundamentals of Python Programming' virtual internship conducted by Nyberman Bioinformatics, Europe.
https://github.com/vaishnavipaithane/genomic-data-and-management-analysis

genomic-data-analysis genomics-visualization python spyder

Last synced: 2 months ago
JSON representation

This miniproject is part of 'Fundamentals of Python Programming' virtual internship conducted by Nyberman Bioinformatics, Europe.

Host: GitHub
URL: https://github.com/vaishnavipaithane/genomic-data-and-management-analysis
Owner: vaishnavipaithane
Created: 2024-06-25T07:09:12.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2024-06-25T11:25:32.000Z (over 1 year ago)
Last Synced: 2025-03-05T08:45:35.696Z (7 months ago)
Topics: genomic-data-analysis, genomics-visualization, python, spyder
Language: Python
Homepage:
Size: 22.5 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Genomic Data Management and Analysis

This miniproject is part of 'Fundamentals of Python Programming' virtual internship conducted by Nyberman Bioinformatics, Europe. This project involves data sorting & extraction, data analysis, and data visualization. I have used the Python programming language and Spyder IDE for the analysis and visualization of data. The dataset consists of protein sequences in fasta format, along with other files such as .vcf and .txt files containing sequence information.

## Data Sorting & Extraction:
- Search for fasta files within specified directories and copy them to a new location. This involves identifying fasta files, moving them to a designated folder
- Extract the first line from each fasta file and record it in a text file. This involves reading each fasta file, extracting the first line, and associating it with the corresponding sample name

## Data Analysis:
- Generate details of each folder in a summary CSV file, including the number of files and the list of files present in each folder. This involves traversing through directories, counting files, and compiling a summary report
- Identify the vcf file, read it into a tabular format
- Filter for variants in chromosome 21

## Data Visualization:
- Create a bar chart showing the count of variants on each chromosome. Each chromosome can be represented by a bar, with the height of the bar indicating the total count of variants on that chromosome

Click [here](https://github.com/vaishnavipaithane/Genomic-Data-and-Management-Analysis/blob/master/Genomic%20data%20and%20management%20analysis.py) to view python script and complete analysis of the project

[Count of variants on each chromosome](https://github.com/vaishnavipaithane/Genomic-Data-and-Management-Analysis/blob/master/Count%20of%20variants%20on%20each%20chromosome.png)

[First line from each fasta file](https://github.com/vaishnavipaithane/Genomic-Data-and-Management-Analysis/blob/cb03cad565f72cb1dcdd2ea594ba3b588d0628d1/new.txt)

[Summary of all files present in each folder](https://github.com/vaishnavipaithane/Genomic-Data-and-Management-Analysis/blob/cb03cad565f72cb1dcdd2ea594ba3b588d0628d1/dic.csv)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vaishnavipaithane/genomic-data-and-management-analysis

Awesome Lists containing this project

README