https://github.com/likelet/blogs_tips
https://github.com/likelet/blogs_tips
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/likelet/blogs_tips
- Owner: likelet
- Created: 2017-10-15T03:10:23.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2022-06-07T02:55:32.000Z (almost 3 years ago)
- Last Synced: 2025-01-07T20:45:54.697Z (4 months ago)
- Size: 56.6 KB
- Stars: 0
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Blogs_tips
## Table of Contents
- [Subset bamfile with chromosome names and convert into paired fastq](#subset-bamfile-with-chromosome-names-and-convert-into-paired-fastq)
- [Cluster management](#cluster-management)
- [R code for ploting nomograph from competing risk survival analysis model](#r-code-for-ploting-nomograph-from-competing-risk-survival-analysis-model)
- [Setting docker download mirror site](#setting-docker-download-mirror-site)
- [Install bioconductor R package using VPS](#install-bioconductor-r-package-using-vps)
- [Install bioconductor R package using mirror at UTSC](#install-bioconductor-r-package-using-mirror-at-utsc)
- [Tips for using Tianhe-2 super computer](#tips-for-using-tianhe-super-computer)
- [Subset your bam file for IGV visualization locally](#subset-your-bam-file-for-igv-visualization-locally)
- [Download TCGA dataset](#download-tcga-dataset)
- [Install hdf5r in Centos 7](#Install-hdf5r-in-Centos-7)## Subset bamfile with chromosome names and convert into paired fastq
* software required: **[sambamba](https://github.com/lomereiter/sambamba)** and **[bam2fastx](https://github.com/infphilo/tophat)** from tophat binary distribution.> sambamba usages should refer to https://github.com/lomereiter/sambamba/wiki/%5Bsambamba-view%5D-Filter-expression-syntax#basic-conditions-for-fields
```shell
#using star output bamfile as example
#!/bin/sh
bamin=$1
#extract reads aligned to chr2
sambamba view -F "ref_id==1" -f bam $bamin -o ${bamin%%Aligned.sortedByCoord.out.bam}_chr2.bam
#sort reads by names if not presorted by software
sambamba sort -n ${bamin%%Aligned.sortedByCoord.out.bam}_chr2.bam -o ${bamin%%Aligned.sortedByCoord.out.bam}_chr2.sort.bam
#bam2fastq
bam2fastx -PANQ -o ${bamin%%Aligned.sortedByCoord.out.bam}_chr2.fq.gz ${bamin%%Aligned.sortedByCoord.out.bam}_chr2.sort.bam```
**PS**: the numbers specified in `ref_id` means the ref order list in header from bamfle, which can be checked by
`samtools view -H your.bam` if samtools was installed.## Cluster management
* 1. shudown system
Shut down computational node
```shell
#!/bin/sh
for i in `seq 1 3`
do
ssh cu0$i "hostname;init 0"
done
```
umount storage
```shell
umount /home
```
shutdown login node
```shell
poweroff
```
## R code for ploting nomograph from competing risk survival analysis model
```R
library(cmprsk)
library(rms)
### add path
setwd("C:\\Users\\hh\\Desktop\\nomo")
rt<-read.csv("Stomach.csv")
rt
View(rt)
attach(rt)
#change variable namescov<-cbind(sexC, Age, AJCC_T,AJCC_N,AJCC_M,Surgery)
for (i in 1:6)
{
cov[,i]<-factor(cov[,i])
}
status<-factor(status)
z <- crr(time,status,cov)
z.p <- predict(z,cov)
n=60#suppose I want to predict the probability of event at time 60(an order)
df<-data.frame(y=z.p[n,-1],cov)
ddist <- datadist(df)
options(datadist='ddist')
lmod<-ols(y~(sexC)+(Age)+(AJCC_T)+(AJCC_N)+(AJCC_M)+(Surgery),data=df)#
nom<-nomogram(lmod)
plot(nom,lplabel=paste("prob. of incidence T",round(z.p[n,1],2),sep="="))
```
## Setting docker download mirror site
Sometimes you may find that it's extrimely painfull to pull docker image from docker.io in china. So this tip can help you to set a mirror site locally in your docker pull command.
* 1. First, find the file `/etc/docker/daemon.json` and modify it with root authority.
```{javascript}
{
"registry-mirrors": ["https://registry.docker-cn.com"]
}
```
* 2. Secondly, restart your docker service.
## Install bioconductor R package using VPS.proxychains4 Rscript -e 'source("http://bioconductor.org/biocLite.R"); biocLite("BSgenome")'
## install bioconductor R package using mirror at UTSC.
source("http://bioconductor.org/biocLite.R")
options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")
biocLite("your package")## Tips for using Tianhe super computer
* 1. Logging in the data transfer server from rj account
ssh -p 5566 ln42
ssh tn2-ib0
## Subset your bam file for IGV visualization locallySometimes, we need to manually check the variants called from different caller, but the bam file often were generated by a remote server or clusters without graphics. Therefore, we have to pull the bamfile from the remote storage which is painfull due to limitted bandwidth. Alternatly, we can subset the bamfile by few command run in the remote server and only pull the bam file with target region in kb size.
samtools view -bh -L $bedfile -o ${bedfile%%.bed}_subset.bam $bamfile
samtools index ${bedfile%%.bed}_subset.bamhere the `bedfile` is a region file with three column including `chr`, `startpos`, `endpos` which covered the target region. When the target is a single position, you should at least set a region flanking this site. For example, if your site is `chr12 200` the region should be `chr12 50 350`, so that it could keep all reads cover that region for check
## Download TCGA dataset
Code provided by Yun Sun
```
for x in *_manifest.txt; do perl -lanE'BEGIN{say qq#{\n\t\"ids\":[#}$.>1 && (eof) ? say qq{\t\t"$F[0]"} : say qq{\t\t"$F[0]",};END{say qq#\t]\n\}#}' $x > ${x%_*}_request.txt; done
for x in *_request.txt; do curl --remote-name --remote-header-name --request POST --header \'Content-Type: application/json\' --data @$x \'https://api.gdc.cancer.gov/data\'; done
```
## install the latest `stringi` in Rpaste from [https://github.com/gagolews/stringi/blob/master/INSTALL#L70](https://github.com/gagolews/stringi/blob/master/INSTALL#L70)
when I install `Seurat` package in R, i found the dependencied package `stringi` could not be installed. My system is centos 7 which has no binary version from CRAN.
After goolge, I finally resolved the probem by the following command```sh
wget https://github.com/gagolews/stringi/archive/master.zip -O stringi.zip
unzip stringi.zip
sed -i '/\/icu..\/data/d' stringi-master/.Rbuildignore
R CMD build stringi-master
```Assuming the most recent development version of the package is numbered x.y.z,
a file named `stringi_x.y.z.tar.gz` is created in the current working directory.
The package can now be installed (the source bundle may be propagated via
`scp` etc.) by executing:```sh
R CMD INSTALL stringi_x.y.z.tar.gz
```Alternatively, call from within an R session:
```r
install.packages("stringi_x.y.z.tar.gz", repos=NULL)
```## Install hdf5r in Centos 7
>install Rpackage `hdf5r` in Centos 7.As the hsd5r depends the `hdf5-devel` upper version(>1.8.4), but the lastest version in centos yum sourse is still 1.8.3. so we need to install the latest hdf5-devel locally, and then install `hdf5r` in R console with `--with-hdf5` configure parameter.
1. install `hdf5-devel` from source
```shell
wget https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.10/hdf5-1.10.5/src/hdf5-1.10.5.tar.gz
# or find package from https://www.hdfgroup.org/downloads/hdf5/source-code/#
tar xvf hdf5-1.10.5.tar.gz
cd hdf5-1.10.5
./configure --prefix=/usr/local/hdf5
make
make check
sudo make install
sudo make check-install
```
2. set the share object path in R profiles
```
echo “dyn.load(’/usr/local/hdf5/lib/libhdf5_hl.so.100’)” >> ~/.Rprofile
# you may encounter errors with different hdf5lib version, in hdf5-1.12.x. you need repace the version suffix with 200
# echo “dyn.load(’/usr/local/hdf5/lib/libhdf5_hl.so.200’)” >> ~/.Rprofile
# then add the LD_LIBRARY_PATH in your System Path
echo LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/hdf5/lib >> ~/.Renviron
```
3. install `hdf5r` in R console```R
install.packages(
'hdf5r',
configure.args = '--with-hdf5=/usr/local/hdf5/bin/h5cc',
type = 'source'
)
```
## Install LOHHLA env
Code from shixiang wang
Loss Of Heterozygosity in Human Leukocyte Antigen, a computational tool to evaluate HLA loss using next-generation sequencing data.
A detail instruction of LOHHLA could be found at [here](https://github.com/mskcc/lohhla)```
mamba create -n hla -c conda-forge -c bioconda lohhla
```
directly create a env for LOHHLA analysis for cancer bamfile
and one of the input file could be found at [here](https://github.com/ANHIG/IMGTHLA/tree/Latest/fasta)
## R code for get screen shot by URLs
[BioTreasury](https://biotreasury.rjmart.cn/#/) need a feature that automatically obtain screen shot from the given urls, and also need the system check wether the urls still work at present. In R environment, I found a proper package can do this and provide the following code to run the task:
```R
# required packages
library(webshot2)library(pbapply)
# innitialization, webshot::install_phantomjs()
#webshot(url, filename.extension)
dat<-read.delim("dat.tsv",header = T)#screen shot function
get_screen_shot<-function(vec){print(vec[4])
tryCatch(webshot(vec[4], paste0("image/",vec[2],".png"),cliprect="viewport"),
error = function(e) paste(vec[4]," not successed"))}
# create res folder
# dir.create("image")
#get screen shot
pbapply(dat, 1, get_screen_shot)```
> The result images could be found at the `image` folder in the current path.