Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/david-r-cox/gene2vec
Vector space representation of genetic data
https://github.com/david-r-cox/gene2vec
Last synced: about 2 months ago
JSON representation
Vector space representation of genetic data
- Host: GitHub
- URL: https://github.com/david-r-cox/gene2vec
- Owner: david-r-cox
- Created: 2016-04-04T15:54:10.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2016-11-18T03:44:48.000Z (about 8 years ago)
- Last Synced: 2023-10-20T00:39:08.316Z (about 1 year ago)
- Language: Jupyter Notebook
- Size: 62.6 MB
- Stars: 36
- Watchers: 7
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Gene2vec: Neural word embeddigs of genetic data
## Overview
Gene2vec is an adaptation of the Word2vec model that aims to construct quasi-syntactic and semantic relationships from amino acid sequence data. Word2vec is an extension upon the continuous Skip-gram model that allows for precise representation of semantic and syntactic word relationships. Additionally, Word2vec representations exhibit additive composability such that vector arithmetic can be performed on words. Mikolov et al. illustrate this behavior by noting that the resulting vector space representation of ("Madrid" - "Spain" + "France") is closer to that of "Paris" than any other word.We demonstrate the successful construction of such relationships from amino acid sequences by using them to perform some rudimentary protein classification.
See the [report](https://nbviewer.ipython.org/github/davidcox143/Gene2vec/blob/master/report/Gene2vec.ipynb) for more info.