https://github.com/austin-starks/c2h2_zf_proteins
Code that I wrote for undergraduate research experience at Princeton University
https://github.com/austin-starks/c2h2_zf_proteins
Last synced: 7 months ago
JSON representation
Code that I wrote for undergraduate research experience at Princeton University
- Host: GitHub
- URL: https://github.com/austin-starks/c2h2_zf_proteins
- Owner: austin-starks
- License: gpl-3.0
- Created: 2019-08-05T18:56:30.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2021-06-25T02:00:23.000Z (over 4 years ago)
- Last Synced: 2025-01-23T08:16:23.516Z (9 months ago)
- Language: Python
- Size: 33.5 MB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.MD
- License: LICENSE
Awesome Lists containing this project
README
# HOW TO USE:
To use this code, you must have the following requirements:
Python 3+
HMMER 2.3.2
Download the code. In the same folder that contains "main", have a folder with the
following names:
analysis
chromosomes
hmmer
output
### Databases:
There should be a folder named databases present. Unzip the folder. Next, download
variant_summary.txt from clinvar: ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/.
Drag it into the folder and unzip it.
### HMMER:
There should be a folder named HMMER present. Unzip the folder.
### Analysis:
This folder is used to compare the predicted change in DNA-binding specificity.
To do this, visit zf.princeton.edu and click "Predict PWS". Type in the ZF protein
sequence and click send. Click on the appropiate ZF domain and change the prediction
model to be Polynomnial SVM. Click submit and download the PWM. Save the file in the
following format:
GeneName_DomainNumber_o.txt - for an original gene.
GeneName_DomainNumber_m.txt - for a mutated gene.
The files are already included in the folder; just unzip the folder.
Chromosomes:
This folder is used to store the chromosomes. Chromsomes can be downloaded from:
ftp://ftp.ensembl.org/pub/release-97/fasta/homo_sapiens/dna/
Finally, run project.py as a script to get the main output files. There are other
functions in the file that can do a variety of things. Feel free to explore. If
you have any questions, please email me at as2779[at]cornell.edu.