https://github.com/averissimo/gene-extractor
Searchs independent search term against KEGG and GeneBank (NCBI)
https://github.com/averissimo/gene-extractor
Last synced: 7 months ago
JSON representation
Searchs independent search term against KEGG and GeneBank (NCBI)
- Host: GitHub
- URL: https://github.com/averissimo/gene-extractor
- Owner: averissimo
- License: gpl-3.0
- Created: 2014-09-19T09:44:39.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2015-06-25T15:13:14.000Z (almost 11 years ago)
- Last Synced: 2025-02-08T18:11:58.287Z (over 1 year ago)
- Language: Ruby
- Size: 285 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
GeneExtractor
==============
Searchs independent terms against different databases and retrieves gene sequences from:
- KEGG - [link](http://www.genome.jp/kegg/genes.html)
- NCBI Nucleotide - [link](http://www.ncbi.nlm.nih.gov/nuccore/)
## Requirements
- [Ruby runtime environment](https://www.ruby-lang.org/en/installation/)
- [Windows](http://rubyinstaller.org/)
- [Mac OSX and Linux](http://rvm.io/)
- Bundle Gem - `gem install bundle`
- [Bioruby gem](http://www.bioruby.org)
## How to Use
1. Run `bundle install --path vendor/bundle` to install dependencies *(currently only Bioruby)*
1. Create a `keys.txt` file *(either by copying keys.txt.example or creating a blank one)*
- Add query terms to keys.txt *(separated by new lines)*
1. Create a `config.yml` file *(either by copying keys.txt.example or creating a blank one)*
- Open the file and change options (if need be)
1. Run `bundle exec ruby script.rb` to search and download all the associated genes
- If you don't install gems locally then just run `ruby script.rb`
### Config.yml options
YML syntax is used to configure GeneExtractor. It is an hierarchical file that uses indentation to define children attribute or lists.
- *email*: user's valid email address necessary to use NCBI Rest API
- *output*:
- *dir*: parent folder to place results from GeneExtractor
- *data_prefix*: add an additional fodler level with date and time when GeneExtractor was executed
- *kegg*: folder name for kegg results
- *ncbi*: folder name for ncbi results
- *search*:
- *ncbi*: list of fields that should be searched in NCBI (each field)
#### example config.yml
email: gene.extractor@mailinator.com
output:
dir: queries
date_prefix: true
kegg: kegg
ncbi: ncbi
search:
ncbi:
- Protein name
- Gene name
- Title
## Ackowledgements
This tool was created as a part of [FCT](www.fct.p) grant SFRH/BD/97415/2013 and European Commission research project [BacHBerry](www.bachberry.eu) (FP7- 613793)
[Developer](http://web.tecnico.ulisboa.pt/andre.verissimo/)