https://github.com/shashwat9kumar/rna_sequencing_and_genome_analysis
A project to sequence the RNA from actual error prone reads from a PCR process and then to analyze the sequenced protein sequence
https://github.com/shashwat9kumar/rna_sequencing_and_genome_analysis
cpp dna dna-sequences gnuplot graph-theory pcr-process rna rna-seq-analysis sequencing
Last synced: 3 months ago
JSON representation
A project to sequence the RNA from actual error prone reads from a PCR process and then to analyze the sequenced protein sequence
- Host: GitHub
- URL: https://github.com/shashwat9kumar/rna_sequencing_and_genome_analysis
- Owner: shashwat9kumar
- License: mit
- Created: 2019-02-13T09:15:06.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2021-08-13T09:07:57.000Z (almost 4 years ago)
- Last Synced: 2025-01-12T19:08:22.430Z (4 months ago)
- Topics: cpp, dna, dna-sequences, gnuplot, graph-theory, pcr-process, rna, rna-seq-analysis, sequencing
- Language: C++
- Homepage:
- Size: 4.47 MB
- Stars: 0
- Watchers: 1
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# RNA Sequencing using C++
A project to sequence the RNA/DNA from actual error prone reads from a PCR machine and then to analyze the so sequenced DNA## DNA Analysis Report
DNA or deoxyribonucleic acid is a long molecule that contains our unique genetic code. Like a recipe book it holds the instructions for making all the proteins in our bodies
Deoxyribonucleic acid (/diːˈɒksɪˌraɪboʊnjuːˌkliːɪk, -ˌkleɪ-/ (About this soundlisten);[1] DNA) is a molecule composed of two polynucleotide chains that coil around each other to form a double helix carrying genetic instructions for the development, functioning, growth and reproduction of all known organisms and many viruses. DNA and ribonucleic acid (RNA) are nucleic acids. Alongside proteins, lipids and complex carbohydrates (polysaccharides), nucleic acids are one of the four major types of macromolecules that are essential for all known forms of life
### The DNA is a comple substance whose understanding requires its analysis
DNA Analysis is the process of determining an individual's DNA characteristics
## Analysis results
Here are someof the analysis results.### 1. Error Proporations in DNA
The DNA has quality values define using Phred 33 encoding. These values pertain to the confidence score of the DNA reads, i.e. how likely are these values to be true
The graph reflectig them visually is a excellent way of demarcating bad quality readsfrom good quality reads.
Additionally, once determined, if the reads are of what category, the kind of sequencing to be used n them can be determined
### 1. Error Analysis for BAses in DNA
The Dna has 4 bases:
- Adenine
- Thyamie
- Guanine
- CytosineFinding and visualizing errors and qualities in these four bases separately will reveal interesting patterns.
One nice obsrvaton from the four quality graphs is that these grahs seem similar, hence proving that the original PCR process used to get the DNA read is a reliable, repeatable and reproducible process
Another observation is that even that they seem similar, they are actually different. Their respective points are different, thereby having overall differet average error line
#### Base A
The Adenine is a base in DNA
#### Base T
The Thyamine is a base in DNA
#### Base G
The Guanine is a base in DNA
#### Base C
The Cytosine is a base in DNA
### 3. GC Ratio comparison
The comparison of GC content between Actual DNA and Calculated DNA. This comparison shows 2 things:
- The sequenced DNA line becomes almost straight, proving that the DNA so sequenced is astable DNA. Hence the sequencing pocess worked fine, and is a trustworthy proces
- The Actual and Found DNA lines coincide as the DNAprogresses, proving that the sequenced DNA has a good level of genome coverage
### 4. DNA Binary Signal Analysis
The DNA BInary Signal Analysis, is a binary signal plot of KMP algorithm output for different length of subsequences from Actual DNA checked against Found DNA
This type of analysis will be used to findthe ocation of genomes, similarity between Actual DNA and Found DNA and the changes of pattern matching on increase on the subsequence length
### 5. DNA Chromatic Dichromation
The DNA frame Buffer
### 6. DNA frame Buffer
The DNA frame Buffer