Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/stefanches7/ensembl-ftp-search

Various end-user tools for searching the FTP sites of the Ensembl project.
https://github.com/stefanches7/ensembl-ftp-search

Last synced: about 1 month ago
JSON representation

Various end-user tools for searching the FTP sites of the Ensembl project.

Awesome Lists containing this project

README

        

# Ensembl/Ensembl Genomes FTP search API ![alt link](https://travis-ci.org/stefanches7/ensembl-ftp-search.svg?branch=master)
HTTP and JS interfaces for searching the FTP sites of Ensembl/Ensembl Genomes

Summary
-------
This API is a comfortable way to get the links to the files needed from [Ensembl FTP site] and
[EnsemblGenomes FTP site]. I.e., having entered e.g. organism release name/taxonomy branch
and data type you are interested in (say, "Sophophora" and "vep") you'll get all the file links to the files matching
your query (in this case all the links leading to "vep" files for all species that are Sophophora).

[Ensembl FTP site]: ftp.ensembl.org
[EnsemblGenomes FTP site]: ftp.ensemblgenomes.org

Credits
-------

The development of this API was started during [Google Summer of Code] program. [Ensembl]/[EG] development team, being a part of European Bioinformatics Institute, accepted the student, Stefan Dvoretskii (author of this repository), to start and develop this project under the mentorship of [Dan Staines] for the best of global Bioinformatics community. The repository author pays many thanks and appreciation to his mentor and Ensembl/EBI community for the great work together and Google Summer of Code community and orgs for making this collaboration technically possible.

[Dan Staines]: https://www.ebi.ac.uk/about/people/dan-staines
[Google Summer of Code]: https://summerofcode.withgoogle.com/
[Ensembl]: https://www.ebi.ac.uk/services/teams/ensembl
[EG]: https://www.ebi.ac.uk/services/teams/ensembl-genomes

Usage
-----
### HTTP interface

API can be accessed through HTTP requests to the various endpoints:

* `/hello` - "Hello World!" healthcheck
* `/search` - the main end-point for searching.
* `/organismNameSuggestion`/`/fileTypeSuggestion` - pattern search for distinct organism_names/file_types in the local
database (links to organism names and file types database).

#### Searching

`/search` endpoint provides the interface to search the database applying all the filters specified. As of now, it will
_intersect_ all the filters you have specified and return the _Java-like_ list of links that match your whole query (i.e.,
list will look like \[\, \...\]).

The available filters are:

* `organismName` - organism _release_ name (lowercase, with underscores, e.g. "drosophila_melanogaster")
* `fileType` - dataType, e.g. "vep" or "fasta_cdna" (fasta subtype after underscore)
* `taxaBranch` - taxonomy _id_ (_number_) as of NCBI Taxonomy database (e.g. "Drosophila melanogaster" has id 7227)

Please note that `page` and `size` parameters are available for **paging** (they should also be _numbers_)

_Examples_:

`:/search?taxaBranch=7227&file_type=vep&page=1&size=20` - vep files for Drosophila melanogaster, second page
of 20 links (links 20-39 from the result)

`curl -d organism_name='drosophila_melanogaster' -d fileType='vep'` - same, but the whole result set this time

### Javascript interface

More user-friendly JS interface is available, see `web\src\js\search-webui`.

### Update job

Data indices, which are used by HTTP interface to provide search and suggestion functionality, are updated through Perl update job. Source code is to find in the `updatejob-perl` directory.

## Setup & configuration
### Requirements

Following software is required to run the HTTP server:
* **JDK1.8**
* **MySQL**5.5+
* *Gradle* \*

\* Gradle can be automatically installed via Gradle Wrapper, that is:

Linux:
```bash
ensembl-ftp-search~$ ./gradlew
```

Windows:
```cmd
ensembl-ftp-search> exec gradlew.bat

```

Javascript user interface was built using *React* libraries and is therefore dependent on **Node** and `npm`.

Update job requires *Perl 5.24+* to run.

### Configuration

Spring Boot application is configured using `main/resources/application.properties` file in the Java sources root **or** passing the command line arguments, e. g. `java -jar build/libs/ensembl-ftp-search-XXXXX.jar --server.port 9988`. See [Spring documentation] for a comprehensive list of Spring Boot options available.
Please _pay special attention_ to the `spring.datasource` group values, as they should point to **running and accessible
MySQL database**. Without it, the application won't be able to start.

[Spring documentation]: https://docs.spring.io/spring-boot/docs/current/reference/html/common-application-properties.html

### Startup

HTTP server is started with `./gradlew bootRun` (or `gradle bootRun` if you already have Gradle) command.

### Logging

HTTP server logs into the command line from which it was started.