https://github.com/akababa/dft-dna
https://github.com/akababa/dft-dna
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/akababa/dft-dna
- Owner: Akababa
- Created: 2018-05-28T20:04:21.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2018-06-20T22:17:00.000Z (almost 7 years ago)
- Last Synced: 2024-10-29T08:04:56.092Z (8 months ago)
- Language: Python
- Size: 31 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# dft-dna
## Steps
1. Use `clean.cpp` to convert the original fasta files into a format with one sequence per line. (`cleaned/` folder)
2. Use `main.py` to do the rest. I originally tried to do it all in fft.cpp but it's much easier to use sklearn.## `ACGTClassifier` class
* Implements the sklearn estimator interface, and properly handles separation of training/test when given raw sequences.
* Allows ACGT values to be set as estimator params.## `main.py`
* Simple 10-fold cross-validation for list of sklearn estimators and one value.
* Grid search to find optimal ACGT values, and outputs results to csv file.