https://github.com/averissimo/gene-extractor

Searchs independent search term against KEGG and GeneBank (NCBI)
https://github.com/averissimo/gene-extractor

Last synced: 15 days ago
JSON representation

Searchs independent search term against KEGG and GeneBank (NCBI)

Host: GitHub
URL: https://github.com/averissimo/gene-extractor
Owner: averissimo
License: gpl-3.0
Created: 2014-09-19T09:44:39.000Z (almost 12 years ago)
Default Branch: master
Last Pushed: 2015-06-25T15:13:14.000Z (about 11 years ago)
Last Synced: 2025-04-03T04:15:13.652Z (over 1 year ago)
Language: Ruby
Size: 285 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          GeneExtractor

==============

Searchs independent terms against different databases and retrieves gene sequences from:

- KEGG - [link](http://www.genome.jp/kegg/genes.html)

- NCBI Nucleotide - [link](http://www.ncbi.nlm.nih.gov/nuccore/)

## Requirements

- [Ruby runtime environment](https://www.ruby-lang.org/en/installation/)

 - [Windows](http://rubyinstaller.org/)

 - [Mac OSX and Linux](http://rvm.io/)

- Bundle Gem - `gem install bundle`

- [Bioruby gem](http://www.bioruby.org)

## How to Use

1. Run `bundle install --path vendor/bundle` to install dependencies *(currently only Bioruby)*

1. Create a `keys.txt` file *(either by copying keys.txt.example or creating a blank one)*

 - Add query terms to keys.txt *(separated by new lines)*

1. Create a `config.yml` file *(either by copying keys.txt.example or creating a blank one)*

 - Open the file and change options (if need be)

1. Run `bundle exec ruby script.rb` to search and download all the associated genes

 - If you don't install gems locally then just run `ruby script.rb`

### Config.yml options

YML syntax is used to configure GeneExtractor. It is an hierarchical file that uses indentation to define children attribute or lists.

- *email*: user's valid email address necessary to use NCBI Rest API

- *output*:

 - *dir*: parent folder to place results from GeneExtractor

 - *data_prefix*: add an additional fodler level with date and time when GeneExtractor was executed

 - *kegg*: folder name for kegg results

 - *ncbi*: folder name for ncbi results

- *search*:

 - *ncbi*: list of fields that should be searched in NCBI (each field)

#### example config.yml

    email: gene.extractor@mailinator.com

    output:

      dir: queries

      date_prefix: true

      kegg: kegg

      ncbi: ncbi

    search:

      ncbi:

        - Protein name

        - Gene name

        - Title

## Ackowledgements

This tool was created as a part of [FCT](www.fct.p) grant SFRH/BD/97415/2013 and European Commission research project [BacHBerry](www.bachberry.eu) (FP7- 613793)

[Developer](http://web.tecnico.ulisboa.pt/andre.verissimo/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/averissimo/gene-extractor

Awesome Lists containing this project

README