https://github.com/tomwhite/misp-2017
MISP camp 2017 materials and code
https://github.com/tomwhite/misp-2017
bioinformatics data data-visualization hackathon
Last synced: 2 months ago
JSON representation
MISP camp 2017 materials and code
- Host: GitHub
- URL: https://github.com/tomwhite/misp-2017
- Owner: tomwhite
- Created: 2017-12-01T17:14:10.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-12-20T15:32:32.000Z (over 8 years ago)
- Last Synced: 2025-01-17T21:43:52.246Z (over 1 year ago)
- Topics: bioinformatics, data, data-visualization, hackathon
- Language: Python
- Homepage:
- Size: 641 KB
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MISP Camp 2017
This repo contains some of the code developed by Group A during MISP Camp 2017 in Denmark.
## Getting Started with Spark
This project has a few examples of how to use [Apache Spark](https://spark.apache.org/).
It is designed to be used with [Cloudera Data Science Workbench](https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_overview.html).
See _word_count.py_ and _tips.r_ for examples.
## Data analysis of Chikungunya Defective Genomes
See _del.py_ and _entropy.py_. Also, _norm.sh_ for producing rough normalization read counts.
The following two images show the distribution of deletion start position across the genome. Note that for passage 2 deletions start in all gene regions, while by passage 12 deletions are not found in structural genes.


The following shows the distribution of GC bases across the genome.

Finally, this diagram shows how normalized Shannon entropy varies by passage and replicate. There is a general downward trend to lower entroy states (more ordered), with some oscillation.
