{"id":19434438,"url":"https://github.com/likelet/blogs_tips","last_synced_at":"2026-03-19T09:53:00.397Z","repository":{"id":86650521,"uuid":"106979902","full_name":"likelet/Blogs_tips","owner":"likelet","description":null,"archived":false,"fork":false,"pushed_at":"2022-06-07T02:55:32.000Z","size":58,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-11-19T04:19:14.328Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/likelet.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-10-15T03:10:23.000Z","updated_at":"2021-10-13T03:36:37.000Z","dependencies_parsed_at":null,"dependency_job_id":"beb370da-5aa0-46a7-9e1e-e88724c9a42b","html_url":"https://github.com/likelet/Blogs_tips","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/likelet/Blogs_tips","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/likelet%2FBlogs_tips","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/likelet%2FBlogs_tips/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/likelet%2FBlogs_tips/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/likelet%2FBlogs_tips/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/likelet","download_url":"https://codeload.github.com/likelet/Blogs_tips/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/likelet%2FBlogs_tips/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30071687,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-04T03:25:38.285Z","status":"ssl_error","status_checked_at":"2026-03-04T03:25:05.086Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T14:46:25.352Z","updated_at":"2026-03-04T04:31:45.793Z","avatar_url":"https://github.com/likelet.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Blogs_tips\n\n## Table of Contents\n\n\u003c!-- START doctoc generated TOC please keep comment here to allow auto update --\u003e\n\u003c!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --\u003e\n\n  - [Subset bamfile with chromosome names and convert into paired fastq](#subset-bamfile-with-chromosome-names-and-convert-into-paired-fastq)\n  - [Cluster management](#cluster-management)\n  - [R code for ploting nomograph from competing risk survival analysis model](#r-code-for-ploting-nomograph-from-competing-risk-survival-analysis-model)\n  - [Setting docker download mirror site](#setting-docker-download-mirror-site)\n  - [Install bioconductor R package using VPS](#install-bioconductor-r-package-using-vps)\n  - [Install bioconductor R package using mirror at UTSC](#install-bioconductor-r-package-using-mirror-at-utsc)\n  - [Tips for using Tianhe-2 super computer](#tips-for-using-tianhe-super-computer)\n  - [Subset your bam file for IGV visualization locally](#subset-your-bam-file-for-igv-visualization-locally)\n  - [Download TCGA dataset](#download-tcga-dataset)\n  - [Install hdf5r in Centos 7](#Install-hdf5r-in-Centos-7)\n\n\n## Subset bamfile with chromosome names and convert into paired fastq  \n* software required: **[sambamba](https://github.com/lomereiter/sambamba)** and **[bam2fastx](https://github.com/infphilo/tophat)** from tophat binary distribution.\u003cbr\u003e\n\n \u003e sambamba usages should refer to https://github.com/lomereiter/sambamba/wiki/%5Bsambamba-view%5D-Filter-expression-syntax#basic-conditions-for-fields\n\n```shell \n#using star output bamfile as example \n#!/bin/sh\nbamin=$1\n#extract reads aligned to chr2\nsambamba view -F \"ref_id==1\" -f bam $bamin -o ${bamin%%Aligned.sortedByCoord.out.bam}_chr2.bam\n#sort reads by names if not presorted by software\nsambamba sort -n ${bamin%%Aligned.sortedByCoord.out.bam}_chr2.bam -o ${bamin%%Aligned.sortedByCoord.out.bam}_chr2.sort.bam\n#bam2fastq\nbam2fastx -PANQ -o ${bamin%%Aligned.sortedByCoord.out.bam}_chr2.fq.gz ${bamin%%Aligned.sortedByCoord.out.bam}_chr2.sort.bam\n\n```\n**PS**: the numbers specified in `ref_id` means the ref order list in header from bamfle, which can be checked by \n`samtools view -H your.bam` if samtools was installed. \n\n\n## Cluster management \n* 1. shudown system \nShut down computational node \n```shell\n#!/bin/sh\nfor i in `seq 1 3`\ndo\n ssh cu0$i \"hostname;init 0\"\ndone\n```\numount storage \n```shell\numount /home\n```\nshutdown login node \n```shell\npoweroff\n```\n## R code for ploting nomograph from competing risk survival analysis model \n```R\nlibrary(cmprsk)\nlibrary(rms)\n### add path \nsetwd(\"C:\\\\Users\\\\hh\\\\Desktop\\\\nomo\")\nrt\u003c-read.csv(\"Stomach.csv\")\nrt\nView(rt)\nattach(rt) \n#change variable names\n\ncov\u003c-cbind(sexC, Age, AJCC_T,AJCC_N,AJCC_M,Surgery)\nfor (i in 1:6)\n{\n  cov[,i]\u003c-factor(cov[,i])\n}\nstatus\u003c-factor(status)\nz \u003c- crr(time,status,cov)\nz.p \u003c- predict(z,cov)\nn=60#suppose I want to predict the probability of event at time 60(an order)\ndf\u003c-data.frame(y=z.p[n,-1],cov)\nddist \u003c- datadist(df)  \noptions(datadist='ddist') \nlmod\u003c-ols(y~(sexC)+(Age)+(AJCC_T)+(AJCC_N)+(AJCC_M)+(Surgery),data=df)#\nnom\u003c-nomogram(lmod)\nplot(nom,lplabel=paste(\"prob. of incidence T\",round(z.p[n,1],2),sep=\"=\"))\n```\n## Setting docker download mirror site \nSometimes you may find that it's extrimely painfull to pull docker image from docker.io in china. So this tip can help you to set a mirror site locally in your docker pull command.  \n* 1. First, find the file `/etc/docker/daemon.json` and modify it with root authority.\n```{javascript}\n{\n  \"registry-mirrors\": [\"https://registry.docker-cn.com\"]\n}\n```\n* 2. Secondly, restart your docker service. \n## Install bioconductor R package using VPS.   \n\n    proxychains4 Rscript -e 'source(\"http://bioconductor.org/biocLite.R\"); biocLite(\"BSgenome\")'\n\n## install bioconductor R package using mirror at UTSC. \n\n    source(\"http://bioconductor.org/biocLite.R\")\n    options(BioC_mirror=\"http://mirrors.ustc.edu.cn/bioc/\")\n    biocLite(\"your package\")\n\n## Tips for using Tianhe super computer  \n\n* 1. Logging in the data transfer server from rj account  \n\n      ssh -p 5566 ln42  \n      ssh tn2-ib0\n    \n## Subset your bam file for IGV visualization locally   \n\nSometimes, we need to manually check the variants called from different caller, but the bam file often were generated by a remote server or clusters without graphics. Therefore, we have to pull the bamfile from the remote storage which is painfull due to limitted bandwidth. Alternatly, we can subset the bamfile by few command run in the remote server and only pull the bam file with target region in kb size.  \n\n    samtools view -bh -L $bedfile -o ${bedfile%%.bed}_subset.bam $bamfile \n    samtools index ${bedfile%%.bed}_subset.bam\n\nhere the `bedfile` is a region file with three column including `chr`, `startpos`, `endpos` which covered the target region. When the target is a single position, you should at least set a region flanking this site. For example, if your site is `chr12 200` the region should be `chr12  50  350`, so that it could keep all reads cover that region for check\n\n## Download TCGA dataset \n\nCode provided by Yun Sun \n\n```\nfor x in *_manifest.txt; do perl -lanE'BEGIN{say qq#{\\n\\t\\\"ids\\\":[#}$.\u003e1 \u0026\u0026 (eof) ? say qq{\\t\\t\"$F[0]\"} : say qq{\\t\\t\"$F[0]\",};END{say qq#\\t]\\n\\}#}' $x \u003e ${x%_*}_request.txt; done\nfor x in *_request.txt; do curl --remote-name --remote-header-name --request POST --header \\'Content-Type: application/json\\' --data @$x \\'https://api.gdc.cancer.gov/data\\'; done\n```\n## install the latest `stringi` in R\n\npaste from [https://github.com/gagolews/stringi/blob/master/INSTALL#L70](https://github.com/gagolews/stringi/blob/master/INSTALL#L70)\nwhen I install `Seurat` package in R, i found the dependencied package `stringi` could not be installed. My system is centos 7 which has no binary version from CRAN. \nAfter goolge, I finally resolved the probem by the following command \n\n```sh\nwget https://github.com/gagolews/stringi/archive/master.zip -O stringi.zip\nunzip stringi.zip\nsed -i '/\\/icu..\\/data/d' stringi-master/.Rbuildignore\nR CMD build stringi-master\n```\n\nAssuming the most recent development version of the package is numbered x.y.z,\na file named `stringi_x.y.z.tar.gz` is created in the current working directory.\nThe package can now be installed (the source bundle may be propagated via\n`scp` etc.) by executing:\n\n```sh\nR CMD INSTALL stringi_x.y.z.tar.gz\n```\n\nAlternatively, call from within an R session:\n\n```r\ninstall.packages(\"stringi_x.y.z.tar.gz\", repos=NULL)\n```\n\n## Install hdf5r in Centos 7\n\u003einstall Rpackage `hdf5r` in Centos 7.   \n\nAs the hsd5r depends the `hdf5-devel` upper version(\u003e1.8.4), but the lastest version in centos yum sourse is still 1.8.3. so we need to install the latest hdf5-devel locally, and then install `hdf5r` in R console with `--with-hdf5` configure parameter. \n1. install `hdf5-devel` from source \n  ```shell \n    wget https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.10/hdf5-1.10.5/src/hdf5-1.10.5.tar.gz\n    # or find package from https://www.hdfgroup.org/downloads/hdf5/source-code/# \n    tar xvf hdf5-1.10.5.tar.gz\n    cd hdf5-1.10.5\n    ./configure --prefix=/usr/local/hdf5\n    make\n    make check\n    sudo make install\n    sudo make check-install\n  ```\n2. set the share object path in R profiles \n  ```\n  echo “dyn.load(’/usr/local/hdf5/lib/libhdf5_hl.so.100’)” \u003e\u003e ~/.Rprofile\n  # you may encounter errors with different hdf5lib version, in hdf5-1.12.x. you need repace the version suffix with 200 \n  # echo “dyn.load(’/usr/local/hdf5/lib/libhdf5_hl.so.200’)” \u003e\u003e ~/.Rprofile\n  # then add the LD_LIBRARY_PATH in your System Path \n  \n  echo LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/hdf5/lib \u003e\u003e ~/.Renviron\n  \n  ```\n3. install `hdf5r` in R console \n\n  ```R\n  install.packages(\n  'hdf5r',\n  configure.args = '--with-hdf5=/usr/local/hdf5/bin/h5cc',\n  type = 'source'\n  )\n  ```\n## Install LOHHLA env \nCode from shixiang wang    \n  Loss Of Heterozygosity in Human Leukocyte Antigen, a computational tool to evaluate HLA loss using next-generation sequencing data.\n  A detail instruction of LOHHLA could be found at [here](https://github.com/mskcc/lohhla)\n\n\n  ```\n  mamba create -n hla -c conda-forge -c bioconda lohhla \n  ```\n  directly create a env for LOHHLA analysis for cancer bamfile \n  and one of the input file could be found at [here](https://github.com/ANHIG/IMGTHLA/tree/Latest/fasta)\n  \n  \n## R code for get screen shot by URLs \n  [BioTreasury](https://biotreasury.rjmart.cn/#/) need a feature that automatically obtain screen shot from the given urls, and also need the system check wether the urls still work at present. In R environment, I found a proper package can do this and provide the following code to run the task:    \n  ```R\n  # required packages \n  library(webshot2)\n\n  library(pbapply)  \n\n\n  # innitialization, webshot::install_phantomjs()\n  #webshot(url, filename.extension)\n  dat\u003c-read.delim(\"dat.tsv\",header = T)\n\n  #screen shot function \n  get_screen_shot\u003c-function(vec){\n\n    print(vec[4])\n    tryCatch(webshot(vec[4], paste0(\"image/\",vec[2],\".png\"),cliprect=\"viewport\"), \n             error = function(e) paste(vec[4],\"  not successed\"))\n\n  }\n\n  # create res folder \n  # dir.create(\"image\")\n  #get screen shot \n  pbapply(dat, 1, get_screen_shot)\n\n\n  ```\n \u003e The result images could be found at the `image` folder in the current path. \n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flikelet%2Fblogs_tips","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flikelet%2Fblogs_tips","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flikelet%2Fblogs_tips/lists"}