https://github.com/mouzkolit/mirfetch
miRNA WebScraper that allows for TargetSpectrum Search
https://github.com/mouzkolit/mirfetch
multithreading python selenium
Last synced: about 2 months ago
JSON representation
miRNA WebScraper that allows for TargetSpectrum Search
- Host: GitHub
- URL: https://github.com/mouzkolit/mirfetch
- Owner: mouzkolit
- Created: 2022-12-07T09:33:19.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-03-16T10:18:00.000Z (over 3 years ago)
- Last Synced: 2025-01-17T23:19:48.378Z (over 1 year ago)
- Topics: multithreading, python, selenium
- Language: Jupyter Notebook
- Homepage:
- Size: 3.26 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
miRFetch Package to provide easy access to the DIANA microT and microCDS Webserver
This package allows to submit RNA sequences provided from tRNA fragments ect and determine the Target Spaces using Selenium WebScraping and
Automatisation using an easy accessible API
Data will be fetched from https://mrmicrot.imsi.athenarc.gr/?r=mrmicrot/index and from
https://dianalab.e-ce.uth.gr/html/dianauniverse/index.php?r=microT_CDS
miRT Fetching Segment:
To start analysis first a dictionary of sequences must generated, consisting of a key (the name of the RNA sequence) and a list harboring the individual RNA
sequences like shown below. In Future we will provide also support of pd.DataFrame as well as output from MintMap a Pipeline to annotate
tRNA fragments
```
from rnaFetch.mirTFetch import mirTFetch
from rnaFetch.mirCDSFetch import microTCDS
RNA = {"GlyCCC": ["GCATTGGTGGTTCAGTGGTAGAATTCTC",
"GCATTGGTGGTTCAGTGGTAGAATTCTCGCC",
"GCATTGGTGGTTCAGTGGTAGAATTCT"],
"LysTTT": ["GGGAGCGCCCGGATAGCTCAGTCGGTAGAGCATCAGACTTTT",
"TCGGGCGGGAGTGGTGGCTTTT",
"TCGGGCGGGAGTGGTGGCTTT"],
"ThrAGT": ["TCGAATCCCAGCGGTGCCTCCA",
"ATCCCAGCGGTGCCTCCA",
"ATCCCAGCGGTGCCTCCG"]
}
```
Then you can initialize the using "Chrome", "Firefox" or "Edge"
```
# Change to Firefox or Edge if you prefer
# Selenium Driver is initialized in headless mode but you can ask for Browser Window setting headless = None
fetcht = mirTFetch("Chrome")
```
We can then set the threshold to consider a target; Can also be added manually via pandas when table is generated
And then we can run the Pipeline to let the miRWebserver determine the Target Spaces, which also includes the BioMart
Mapping using multithreading to convert Ensembl Transcript ID to Ensembl Gene ID and the external gene name,
which can be better used for downstream analysis like GProfiler Analysis or Diana microT CDS analysis
```
fetcht.threshold = 0.95
# this will return a table or save a table in self.prediction_data
# In addition UTR sequence Table will be provided in self.utr_table
# data table will be also returned
final_table = fetcht.run_miRNA_analysis(dictionary)
```
Get RNA miRNA overlap
We also provided overlapping target spaces between miRNAs and queried sequences using the mirCDSFetch module
Input the final_table generate after biomart annotation into the following code snippet.
The ouptut is a table of miRNA:sequence prediction partner shared in the grouped table. We further allow for specific visualizations of target
space overlaps between the queried RNA and miRNAs using Sankey Plots
```
# This will connect to the microTCDS webpage via a Selenium Driver
# 500 Genes per run will be supplied in chunks
# Threshold can be set to a float between 0-1 and will be automatically set
fetchcds = microTCDS(final_table)
new_table = fetchcds.run_miRNA_analysis(threshold = 0.95)
overlap, grouped = fetchcds.get_mt_cds_overlap(final_table, new_table)
```
A second Tutorial how to got directly from list having miRNA detections is shown in the Tutorial Folder