https://github.com/likelet/blogs_tips

Last synced: 3 months ago
JSON representation
Host: GitHub
URL: https://github.com/likelet/blogs_tips
Owner: likelet
Created: 2017-10-15T03:10:23.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2022-06-07T02:55:32.000Z (almost 3 years ago)
Last Synced: 2025-01-07T20:45:54.697Z (4 months ago)
Size: 56.6 KB
Stars: 0
Watchers: 3
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        # Blogs_tips

## Table of Contents

  - [Subset bamfile with chromosome names and convert into paired fastq](#subset-bamfile-with-chromosome-names-and-convert-into-paired-fastq)

  - [Cluster management](#cluster-management)

  - [R code for ploting nomograph from competing risk survival analysis model](#r-code-for-ploting-nomograph-from-competing-risk-survival-analysis-model)

  - [Setting docker download mirror site](#setting-docker-download-mirror-site)

  - [Install bioconductor R package using VPS](#install-bioconductor-r-package-using-vps)

  - [Install bioconductor R package using mirror at UTSC](#install-bioconductor-r-package-using-mirror-at-utsc)

  - [Tips for using Tianhe-2 super computer](#tips-for-using-tianhe-super-computer)

  - [Subset your bam file for IGV visualization locally](#subset-your-bam-file-for-igv-visualization-locally)

  - [Download TCGA dataset](#download-tcga-dataset)

  - [Install hdf5r in Centos 7](#Install-hdf5r-in-Centos-7)

## Subset bamfile with chromosome names and convert into paired fastq  

* software required: **[sambamba](https://github.com/lomereiter/sambamba)** and **[bam2fastx](https://github.com/infphilo/tophat)** from tophat binary distribution.


 > sambamba usages should refer to https://github.com/lomereiter/sambamba/wiki/%5Bsambamba-view%5D-Filter-expression-syntax#basic-conditions-for-fields

```shell 

#using star output bamfile as example 

#!/bin/sh

bamin=$1

#extract reads aligned to chr2

sambamba view -F "ref_id==1" -f bam $bamin -o ${bamin%%Aligned.sortedByCoord.out.bam}_chr2.bam

#sort reads by names if not presorted by software

sambamba sort -n ${bamin%%Aligned.sortedByCoord.out.bam}_chr2.bam -o ${bamin%%Aligned.sortedByCoord.out.bam}_chr2.sort.bam

#bam2fastq

bam2fastx -PANQ -o ${bamin%%Aligned.sortedByCoord.out.bam}_chr2.fq.gz ${bamin%%Aligned.sortedByCoord.out.bam}_chr2.sort.bam

```

**PS**: the numbers specified in `ref_id` means the ref order list in header from bamfle, which can be checked by 

`samtools view -H your.bam` if samtools was installed. 

## Cluster management 

* 1. shudown system 

Shut down computational node 

```shell

#!/bin/sh

for i in `seq 1 3`

do

 ssh cu0$i "hostname;init 0"

done

```

umount storage 

```shell

umount /home

```

shutdown login node 

```shell

poweroff

```

## R code for ploting nomograph from competing risk survival analysis model 

```R

library(cmprsk)

library(rms)

### add path 

setwd("C:\\Users\\hh\\Desktop\\nomo")

rt<-read.csv("Stomach.csv")

rt

View(rt)

attach(rt) 

#change variable names

cov<-cbind(sexC, Age, AJCC_T,AJCC_N,AJCC_M,Surgery)

for (i in 1:6)

{

  cov[,i]<-factor(cov[,i])

}

status<-factor(status)

z <- crr(time,status,cov)

z.p <- predict(z,cov)

n=60#suppose I want to predict the probability of event at time 60(an order)

df<-data.frame(y=z.p[n,-1],cov)

ddist <- datadist(df)  

options(datadist='ddist') 

lmod<-ols(y~(sexC)+(Age)+(AJCC_T)+(AJCC_N)+(AJCC_M)+(Surgery),data=df)#

nom<-nomogram(lmod)

plot(nom,lplabel=paste("prob. of incidence T",round(z.p[n,1],2),sep="="))

```

## Setting docker download mirror site 

Sometimes you may find that it's extrimely painfull to pull docker image from docker.io in china. So this tip can help you to set a mirror site locally in your docker pull command.  

* 1. First, find the file `/etc/docker/daemon.json` and modify it with root authority.

```{javascript}

{

  "registry-mirrors": ["https://registry.docker-cn.com"]

}

```

* 2. Secondly, restart your docker service. 

## Install bioconductor R package using VPS.   

    proxychains4 Rscript -e 'source("http://bioconductor.org/biocLite.R"); biocLite("BSgenome")'

## install bioconductor R package using mirror at UTSC. 

    source("http://bioconductor.org/biocLite.R")

    options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")

    biocLite("your package")

## Tips for using Tianhe super computer  

* 1. Logging in the data transfer server from rj account  

      ssh -p 5566 ln42  

      ssh tn2-ib0

    

## Subset your bam file for IGV visualization locally   

Sometimes, we need to manually check the variants called from different caller, but the bam file often were generated by a remote server or clusters without graphics. Therefore, we have to pull the bamfile from the remote storage which is painfull due to limitted bandwidth. Alternatly, we can subset the bamfile by few command run in the remote server and only pull the bam file with target region in kb size.  

    samtools view -bh -L $bedfile -o ${bedfile%%.bed}_subset.bam $bamfile 

    samtools index ${bedfile%%.bed}_subset.bam

here the `bedfile` is a region file with three column including `chr`, `startpos`, `endpos` which covered the target region. When the target is a single position, you should at least set a region flanking this site. For example, if your site is `chr12 200` the region should be `chr12  50  350`, so that it could keep all reads cover that region for check

## Download TCGA dataset 

Code provided by Yun Sun 

```

for x in *_manifest.txt; do perl -lanE'BEGIN{say qq#{\n\t\"ids\":[#}$.>1 && (eof) ? say qq{\t\t"$F[0]"} : say qq{\t\t"$F[0]",};END{say qq#\t]\n\}#}' $x > ${x%_*}_request.txt; done

for x in *_request.txt; do curl --remote-name --remote-header-name --request POST --header \'Content-Type: application/json\' --data @$x \'https://api.gdc.cancer.gov/data\'; done

```

## install the latest `stringi` in R

paste from [https://github.com/gagolews/stringi/blob/master/INSTALL#L70](https://github.com/gagolews/stringi/blob/master/INSTALL#L70)

when I install `Seurat` package in R, i found the dependencied package `stringi` could not be installed. My system is centos 7 which has no binary version from CRAN. 

After goolge, I finally resolved the probem by the following command 

```sh

wget https://github.com/gagolews/stringi/archive/master.zip -O stringi.zip

unzip stringi.zip

sed -i '/\/icu..\/data/d' stringi-master/.Rbuildignore

R CMD build stringi-master

```

Assuming the most recent development version of the package is numbered x.y.z,

a file named `stringi_x.y.z.tar.gz` is created in the current working directory.

The package can now be installed (the source bundle may be propagated via

`scp` etc.) by executing:

```sh

R CMD INSTALL stringi_x.y.z.tar.gz

```

Alternatively, call from within an R session:

```r

install.packages("stringi_x.y.z.tar.gz", repos=NULL)

```

## Install hdf5r in Centos 7

>install Rpackage `hdf5r` in Centos 7.   

As the hsd5r depends the `hdf5-devel` upper version(>1.8.4), but the lastest version in centos yum sourse is still 1.8.3. so we need to install the latest hdf5-devel locally, and then install `hdf5r` in R console with `--with-hdf5` configure parameter. 

1. install `hdf5-devel` from source 

  ```shell 

    wget https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.10/hdf5-1.10.5/src/hdf5-1.10.5.tar.gz

    # or find package from https://www.hdfgroup.org/downloads/hdf5/source-code/# 

    tar xvf hdf5-1.10.5.tar.gz

    cd hdf5-1.10.5

    ./configure --prefix=/usr/local/hdf5

    make

    make check

    sudo make install

    sudo make check-install

  ```

2. set the share object path in R profiles 

  ```

  echo “dyn.load(’/usr/local/hdf5/lib/libhdf5_hl.so.100’)” >> ~/.Rprofile

  # you may encounter errors with different hdf5lib version, in hdf5-1.12.x. you need repace the version suffix with 200 

  # echo “dyn.load(’/usr/local/hdf5/lib/libhdf5_hl.so.200’)” >> ~/.Rprofile

  # then add the LD_LIBRARY_PATH in your System Path 

  

  echo LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/hdf5/lib >> ~/.Renviron

  

  ```

3. install `hdf5r` in R console 

  ```R

  install.packages(

  'hdf5r',

  configure.args = '--with-hdf5=/usr/local/hdf5/bin/h5cc',

  type = 'source'

  )

  ```

## Install LOHHLA env 

Code from shixiang wang    

  Loss Of Heterozygosity in Human Leukocyte Antigen, a computational tool to evaluate HLA loss using next-generation sequencing data.

  A detail instruction of LOHHLA could be found at [here](https://github.com/mskcc/lohhla)

  ```

  mamba create -n hla -c conda-forge -c bioconda lohhla 

  ```

  directly create a env for LOHHLA analysis for cancer bamfile 

  and one of the input file could be found at [here](https://github.com/ANHIG/IMGTHLA/tree/Latest/fasta)

  

  

## R code for get screen shot by URLs 

  [BioTreasury](https://biotreasury.rjmart.cn/#/) need a feature that automatically obtain screen shot from the given urls, and also need the system check wether the urls still work at present. In R environment, I found a proper package can do this and provide the following code to run the task:    

  ```R

  # required packages 

  library(webshot2)

  library(pbapply)  

  # innitialization, webshot::install_phantomjs()

  #webshot(url, filename.extension)

  dat<-read.delim("dat.tsv",header = T)

  #screen shot function 

  get_screen_shot<-function(vec){

    print(vec[4])

    tryCatch(webshot(vec[4], paste0("image/",vec[2],".png"),cliprect="viewport"), 

             error = function(e) paste(vec[4],"  not successed"))

  }

  # create res folder 

  # dir.create("image")

  #get screen shot 

  pbapply(dat, 1, get_screen_shot)

  ```

 > The result images could be found at the `image` folder in the current path.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/likelet/blogs_tips

Awesome Lists containing this project

README