https://github.com/blahah/solvers-gene-motifs
Solvers.io project to predict gene expression from motif combinations in promoters
https://github.com/blahah/solvers-gene-motifs
Last synced: 4 months ago
JSON representation
Solvers.io project to predict gene expression from motif combinations in promoters
- Host: GitHub
- URL: https://github.com/blahah/solvers-gene-motifs
- Owner: blahah
- License: mit
- Created: 2014-02-10T15:57:02.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2014-03-19T11:13:23.000Z (about 11 years ago)
- Last Synced: 2024-12-27T19:13:21.433Z (5 months ago)
- Size: 598 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
solvers-gene-motifs
===================Solvers.io project to predict gene expression from motif combinations in promoters
## Example data
`example_data/example_500rows.csv` is an example of the kind of data we can generate.
- There is an initial column, `AGI`, which contains the gene identifier. This is for information only - it can be discarded for the analysis.
- A final label column, `Value`, is dummy encoded*: either 1 or -1. 1 means the gene was expressed in a particular cell type, while -1 means it was not expressed.
- All the remaining columns are features (transcription factor binding motifs) that exist in the promoter of one or more genes. These are binary: 1 indicates the feature was present, 0 indicates it was absent.\* *note: we can also provide scalar values rather than dummy encoding.*
## Full datasets
One dataset is provided for now. We can generate many such datasets if needed.
1. https://www.dropbox.com/s/ghb7w0nh0xqg0qh/A_thaliana_motifs_guard_cell_expression.csv
## The challenge
We want to be able to:
1. Predict whether a gene will be expressed in a particular condition given its promoter sequence
2. Find out exactly which combinations of motifs are important in the predictionsor to rephrase without the biology:
1. Predict the `Value` given the feature columns.
2. Identify which features are important in the predictions.