Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jorainer/mirtarbase-db
R package providing the validate miRNA-target gene interactions (MTIs) from the miRTarBase database.
https://github.com/jorainer/mirtarbase-db
Last synced: 25 days ago
JSON representation
R package providing the validate miRNA-target gene interactions (MTIs) from the miRTarBase database.
- Host: GitHub
- URL: https://github.com/jorainer/mirtarbase-db
- Owner: jorainer
- License: mit
- Created: 2014-08-12T12:20:34.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2014-08-12T13:15:35.000Z (about 10 years ago)
- Last Synced: 2024-05-15T09:41:35.130Z (6 months ago)
- Language: R
- Size: 5.43 MB
- Stars: 1
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
Table of Contents
1. miRTarBase R package
- 1.1. Installation
- 1.2. Create the database file
1.3. Development
- 1.4. Changelog:
1.5. TODOs
- 1.5.1. DONE Create the sqlite database file from the tab delimited txt file.
- 1.5.2. DONE Create a class representing an interaction (?).
- 1.5.3. DONE Create some helper functions to retrieve information from the DB.
- 1.5.4. DONE Create a txt file with all informations (version, date etc) from the miRTarBase and read it when calling mirtarbase()
- 1.5.5. DONE Implement the function to get MTIs for a specified gene.
- 1.5.6. DONE Create a vignette for this package.
- 1.5.7. CANCELED Include also the phenomiR database?
- 1.5.8. DONE implement a central
getMti
function.- 1.5.9. DONE Implement functions to create MTI objects from a data.frame (and vice versa).
[4/4]
- 1.5.10. DONE Implement all methods for the
Report
class[4/4]
.- 1.5.11. DONE Implement all methods for the
MTI
class[12/12]
.- 1.5.12. DONE Implement a function that retrieves additional miRNA annotations for a MTI.
- 1.5.13. DONE Implent additional functions
[2/2]
---
This packages provides the miRNA-target gene interactions (MTI) from the miRTarBase.
Clone the git repository and build the package using:
R CMD build mirtarbase-db
To install the package call:
R CMD INSTALL mirtarbase.db_*
After that you can use the package as any other R package. The package provides also a vignette with some examples that can be opened, after loading the library in R (`library(mirtarbase.db)`), by the `openVignette()` command.
Using the code in this section we create the `SQLite` database for the package. The workflow is the following: first we have to download the XLS spreadsheet from the miRTarBase web site, which we have to open in Excel and export as tabulator delimited text file. We will then read this file into R in order format it for our requirements (i.e. rename the column names) and we store this file than into a `SQLite` database.
Assuming that the Excel file has been downloaded from and has been exported as a tabulator delimited txt file to `orig_files/txt`. Next we read this file, rename the columns and export it again as a tab delimited text file (over-writing the original one).
It is a little more complicated to process this file due to the following problems:
- Experiments column contains 5' RACE and 5" RACE and these will be recognized by `R` as quotes. Thus we have to read the file without specifying quotes.
- Experiments column contains *Weak* but also *WeaK*, will have to replace these.
- Some Entrezgene IDs are NA.All in all, we hope that we don't encounter some more issues in the next release.
library( RSQLite )
## the XLS file contains ' and " inside cells, thus have to set qupte=""
MTI <- read.table( "orig_files/txt/miRTarBase_MTI.txt", sep="\t", as.is=TRUE, header=TRUE, check.names=FALSE, quote="" )
## re-formating column names.
CN <- tolower( colnames( MTI ) )
CN <- gsub( CN, pattern="\"", replacement="" )
CN <- gsub( CN, pattern="[(|)]", replacement="" )
CN <- gsub( CN, pattern=" ", replacement="_", fixed=TRUE )
colnames( MTI ) <- CN
## now I want to get rid of ALL \", ", ' in the table!
character.CN <- CN[ !CN %in% c( "target_gene_entrez_gene_id", "references_pmid" ) ]
## loop through all these columns and replace "\""
for( current.CN in character.CN ){
MTI[ , current.CN ] <- gsub( MTI[ , current.CN ], pattern="\"", replacement="" )
}
## now, replace all remaining " and ' in experiments
MTI[ , "experiments" ] <- gsub( MTI[ , "experiments" ], pattern="'", replacement="", fixed=TRUE )
## fixing the WeaK thing.
MTI[ , "support_type" ] <- gsub( MTI[ , "support_type" ], pattern="WeaK", replacement="Weak", fixed=TRUE )
## fix NA entrezids
which.NAs <- which( is.na( MTI$target_gene_entrez_gene_id ) )
## for all of these, check if there is another gene, same species.
for( idx in which.NAs ){
tmp <- MTI[ MTI$target_gene==MTI[ idx, "target_gene" ] &
MTI$species_target_gene==MTI[ idx, "species_target_gene" ], , drop=FALSE ]
if( length( tmp[ !is.na( tmp$target_gene_entrez_gene_id ), "target_gene_entrez_gene_id" ] ) > 0 ){
MTI[ idx, "target_gene_entrez_gene_id" ] <-
unique( tmp[ !is.na( tmp$target_gene_entrez_gene_id ), "target_gene_entrez_gene_id" ] )
}
}
## exporting the file again.
## write.table( MTI, file="orig_files/txt/miRTarBase_MTI.txt", sep="\t", row.names=FALSE )We could also create the database in a cleaner way, i.e. as a *real* relational database with a table for the miRNA target gene interaction, a table with the evidence (Report) and one relating each other. The question remains whether that would improve it's performance.
Note that each publication can have more than one report.## want to know whether the "report" is unique
## (i.e. pubmed id, experiments and support type) across all miRNAs are specific for a miRNA.
Report <- data.frame( matrix( ncol=4, nrow=0 ), stringsAsFactors=FALSE )
colnames( Report ) <- c( "report_id", "experiments", "support_type", "references_pmid" )
mti2report <- matrix( ncol=2, nrow=0 )
colnames( mti2report ) <- c( "mti_id", "report_id" )
for( i in 1:nrow( MTI ) ){
current.rep <- MTI[ i, c( "experiments", "support_type", "references_pmid" ) ]
## check if it's in Report
idx <- which( rownames( Report )==paste0( current.rep[ 1, ], collapse="-" ) )
if( length( idx ) > 0 ){
## have it already
#cat( "have already the report\n" )
report.id <- Report[ idx, "report_id" ]
}else{
rownames( current.rep ) <- paste0( current.rep[ 1, ], collapse="-" )
report.id <- paste0( "MTIR", sprintf( "%04d", ( nrow( Report )+1 ) ) )
current.rep <- cbind( report_id=report.id, current.rep, stringsAsFactors=FALSE )
Report <- rbind( Report, current.rep )
}
## add it to mti2report.
mti2report <- rbind( mti2report, c( MTI[ i, 1 ], report.id ) )
}
## at last define the MTI table.
MTI.table <- unique( MTI[ , !( colnames( MTI ) %in% colnames( Report ) ) ] )## ok, now we do have all tables.
con <- dbConnect( dbDriver( "SQLite" ), dbname="inst/extdata/db/mirtarbase_rel.db" )
if( dbExistsTable( con, "mti" ) ){
dbRemoveTable( con, "mti" )
}
if( dbExistsTable( con, "report" ) ){
dbRemoveTable( con, "report" )
}
if( dbExistsTable( con, "mti2report" ) ){
dbRemoveTable( con, "mti2report" )
}
## mti
dbWriteTable( con, name="mti", MTI.table, row.names=FALSE )
dbGetQuery( con, "create index mirna_idx on mti (mirna);" )
dbGetQuery( con, "create index target_gene_idx on mti (target_gene);" )
dbGetQuery( con, "create index target_gene_entrez_idx on mti (target_gene_entrez_gene_id);" )
## report
dbWriteTable( con, name="report", Report, row.names=FALSE )
dbGetQuery( con, "create index report_id_idx on report (report_id);" )
## mti2report
dbWriteTable( con, name="mti2report", data.frame( mti2report, stringsAsFactors=FALSE ), row.names=FALSE )
dbGetQuery( con, "create index mti2report_report_idx on mti2report (report_id);" )
dbGetQuery( con, "create index mti2report_mti_idx on mti2report (mti_id);" )dbDisconnect( con )
Next we create the database for the package. This requires a working installation of `SQLite`.
con <- dbConnect( dbDriver( "SQLite" ), dbname="inst/extdata/db/mirtarbase.db" )
if( dbExistsTable( con, "mirtarbase" ) ){
dbRemoveTable( con, "mirtarbase" )
}
dbWriteTable( con, name="mirtarbase", MTI, row.names=FALSE )
dbGetQuery( con, "create index mirna_idx on mirtarbase (mirna);" )
dbGetQuery( con, "create index target_gene_idx on mirtarbase (target_gene);" )
dbGetQuery( con, "create index target_gene_entrez_idx on mirtarbase (target_gene_entrez_gene_id);" )
dbDisconnect( con )At present (
), the miRTarBase can only be exported as a XLS spread sheet. This file contains one line per MTI and publication:
- MTI MIRT000140: online: 2 references, XLS sheet: one (one reference missing in XLS sheet).
- MTI MIRT001206: online: 8 references, XLS sheet: 3 rows (several rows missing).
- MTI MIRT003413: online: 4 references, XLS sheet: 2 rows (2 rows missing):
- PMID: 18328430: experiments: Luciferase reporter assay//Microarray//Western blot; evidence: Functional MTI. Online: also Other listed as experiment.
- PMID: 19422085: experiments: Luciferase reporter assay//Microarray//qRT-PCR//Western blot; evidence: Functional MTI. Online: also Other listed as experiment.Thus, an MTI between the same miRNA and target gene (for the same species!) is stored in several rows in the XLS sheet. Each row seems however be related to one publication, and the field *experiments* seems to list all experiments performed in that publication.
One question is whether the XLS sheet should be stored as-is to the database, or whether a clean relational database should be created. The benefit from the latter approach would be to allow more specific queries, e.g. all MTIs based on a certain support type.
It would be possible to create a relational database with 3 tables, one describing the MTI, one listing the experiments performed in a publication to validate this MTI and one allowing for possible n:m relations (although it's not clear whether these really exist… yet).
The MTI class represents a miRNA target gene interaction. There should only be one MTI for a miRNA target gene combination, with multiple evidences and eventually multiple publications. The unique identifier for a MTI is the identifier used in the miRTarBase (e.g. *MIRT001206*).
class MTI
L\_ class Report
L\_ experiments: lists all experiments that have been performed.
L\_ pmid: returns the (PMID) of the report.
L\_ supportedBy: list support type(s) (evidences).
L\_ reports: returns the Report(s) of the MTI.
L\_ experiments: returns all experiments (of the Report(s)).
L\_ pmids: returns the PMIDs of the Report(s).
L\_ supportedBy- version 0.2.2 (2014-08-12);
- `getMti` will use `mclapply` to query the database and process its results if the number of mature miRNA sequences is larger than 20.
- version 0.2.1 (2014-08-01);
- Added some functions to retrieve annotations from the mirbase package.
- Added the Vignette.
- version 0.2.0 (2014-08-01):
- Fixed some series problems in the XLS sheet from the miRTarBase. Missed some MTIs in the previous database versions.
- Removed functions `getMtiForGene` and `getMtiForMiRNA` and replaced both with `getMti`.
- version 0.1.0 (2014-07-29): it's done: the first *release*. Basic functionality, not Vignette, no S4 objects (yet).
- version 0.0.1 (2014-07-28): added database.### DONE Create the sqlite database file from the tab delimited txt file.
- State "DONE" from "TODO"
### DONE Create a class representing an interaction (?).
- State "DONE" from "TODO"
What would be the benefits from this?
- Object oriented approach (which would be helpful if not the only function of the package would be to fetch data).
Did that. That's the `MTI` object.
### DONE Create some helper functions to retrieve information from the DB.
- State "DONE" from "TODO"
### DONE Create a txt file with all informations (version, date etc) from the miRTarBase and read it when calling mirtarbase()
- State "DONE" from "TODO"
### DONE Implement the function to get MTIs for a specified gene.
- State "DONE" from "TODO"
### DONE Create a vignette for this package.
- State "DONE" from "TODO"
### CANCELED Include also the phenomiR database?
- State "CANCELED" from "DONE"
miRTarBase contains also other databases and will (hopefully) be updated regularily.
- State "DONE" from "TODO"This makes only sense, if the phenomiR does provide additional information.
Will not do that, since the guys from miRTarBase claim that they provide the most data and include also other databases.### DONE implement a central `getMti` function.
- State "DONE" from "TODO"
This function should take either one or more gene or miRNA IDs as input and return a list of MTI objects.
Split the data.frame by miRTarBase IDs, make one MTI for each data.frame and nrow Report classes.### DONE Implement functions to create MTI objects from a data.frame (and *vice versa*).
[4/4]
- State "DONE" from "TODO"
These should be put into *convertfunctions.R*.
- [X] data.frame2report
- [X] report2data.frame
- [X] data.frame2mti
- [X] mti2data.frame### DONE Implement all methods for the `Report` class
[4/4]
.- State "DONE" from "TODO"
- [X] show
- [X] experiments
- [X] supportedBy
- [X] pmid### DONE Implement all methods for the `MTI` class
[12/12]
.- State "DONE" from "TODO"
- [X] show
- [X] id
- [X] reports
- [X] experiments
- [X] supportedBy
- [X] pmid
- [X] matureMirna
- [X] mirnaSpecies
- [X] targetGene
- [X] targetGeneSpecies
- [X] targetGeneEntrezid
- [X] countReports### DONE Implement a function that retrieves additional miRNA annotations for a MTI.
- State "DONE" from "TODO"
The idea is to get the pre-miRNA or miRNA family for a given mature miRNA.
This should then also be exported if MTIs are exported as a `data.frame`.### DONE Implent additional functions
[2/2]
- State "DONE" from "TODO"
- [X] getAvailableExperiments: retrieves a unique list of experiments.
- [X] getPmids: retrieves a unique list of PubMed IDs.