https://github.com/ZhihaoXie/awesome-bioinformatics-tools

A curated list of awesome Bioinformatics software, tools and resources
https://github.com/ZhihaoXie/awesome-bioinformatics-tools

List: awesome-bioinformatics-tools

Last synced: about 1 month ago
JSON representation

A curated list of awesome Bioinformatics software, tools and resources

Host: GitHub
URL: https://github.com/ZhihaoXie/awesome-bioinformatics-tools
Owner: ZhihaoXie
License: gpl-3.0
Created: 2019-04-03T16:33:27.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2019-04-28T13:25:16.000Z (about 6 years ago)
Last Synced: 2025-04-27T22:01:57.859Z (about 2 months ago)
Size: 35.2 KB
Stars: 30
Watchers: 3
Forks: 10
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-awesomeness-bioinformatics - Awesome Bioinformatics Tools

README

# awesome-bioinformatics-tools
A curated list of awesome Bioinformatics software, tools and resources.

一些高校、研究所也有整理软件工具列表，如：

+ https://wiki.gacrc.uga.edu/wiki/Main_Page
+ https://wiki.rc.ufl.edu/doc/Category:Software
+ http://www.vcru.wisc.edu/simonlab/bioinformatics/programs/index.html

一些论坛也有类似的讨论帖，如 http://seqanswers.com/wiki/Software

我个人推荐一个网站，上面有很多的工具说明：https://omictools.com/

## 1、质量控制Quality Control

- FastQC（http://www.bioinformatics.babraham.ac.uk/projects/fastqc/）
备注：FastQC用法：http://www.plob.org/2013/07/16/5987.html
- Fastx-toolkit（http://hannonlab.cshl.edu/fastx_toolkit/）
- PrinSeq（http://prinseq.sourceforge.net/）
- FastUniq（https://sourceforge.net/projects/fastuniq/）：将多个fastq合并为2个文件，同时去掉重复序列（duplicates）。（注意，fastuniq 不能读取 fastq gzip 压缩文件，需解压。）
其他去除duplicates（不基于参考基因组比对）的工具有：fastx_collapser in the FASTX-Toolkit（single-end） and Fulcrum、CD-HIT-DUP、GPU-DupRemoval
去除duplicates，可参考文献：https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123249/
- QUASR：https://sourceforge.net/projects/quasr/：QUASR is a lightweight pipeline written to process and analyse next-generation sequencing (NGS) data from Illumina, 454, and Ion Torrent platforms.
- RSeQC：RSeQC包,它提供了一系列有用的小工具能够评估高通量测序尤其是RNA-seq数据.比如一些基本模块;检查序列质量,核酸组分偏性,PCR偏性,GC含量偏性,还有RNA-seq特异性模块:评估测序饱和度，映射读数分布，覆盖均匀性，链特异性，转录水平RNA完整性等。https://www.jianshu.com/p/edb9a5c3ecb0

## 2、reads剪切过滤（trim处理）

- Vectors，Adapters, linkers, and PCR primers检索：https://www.ncbi.nlm.nih.gov/tools/vecscreen/
- Cutadapt: https://github.com/marcelm/cutadapt 或者 http://cutadapt.readthedocs.io/en/stable/index.html 切除adapter序列
- Trimmomatic（http://www.usadellab.org/cms/?page=trimmomatic）
- sickle(https://github.com/najoshi/sickle/)
- NGSQC toolkit（http://www.nipgr.res.in/ngsqctoolkit.html）
备注：NGSQC toolkit的用法：http://blog.csdn.net/shmilyringpull/article/details/9225195
- SolexaQA（http://solexaqa.sourceforge.net/ 或者 https://sourceforge.net/projects/solexaqa/files/src/）
- Trim Galore：http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ 或者 https://github.com/FelixKrueger/TrimGalore
- Platanus_trim：http://platanus.bio.titech.ac.jp/?page_id=30 （不支持gzip格式的fastq文件）
- Seqtk: https://github.com/lh3/seqtk
- Seqprep（https://github.com/jstjohn/SeqPrep）
- TagCleaner（https://sourceforge.net/projects/tagcleaner/files/）：remove tag sequences (e.g. WTA or MID tags) from metagenomic datasets.
- BioPieces: http://code.google.com/p/biopieces/
- fastp：https://github.com/OpenGene/fastp
- SOAPnuke：https://github.com/BGI-flexlab/SOAPnuke
- seq_crumbs（https://bioinf.comav.upv.es/seq_crumbs/）（python2程序，不推荐！）
- seqcln（https://sourceforge.net/projects/seqclean/）（针对fasta format，不推荐！）
质控工具间的比较：https://zhuanlan.zhihu.com/p/28924793

二代测序---质量控制篇，参考：http://www.cnblogs.com/ZHshuang463508120/p/3606871.html

## 3、Reads error correction

Reads error correction相关工具有：SOAPec、ErrorCorrection，这2个都是华大开发的，在 http://soap.genomics.org.cn/soapdenovo.html 均可下载.

- SOAPec_v2.01.tar.gz, a correction tool for SOAPdenovo:
http://sourceforge.net/projects/soapdenovo2/files/ErrorCorrection/SOAPec_v2.01.tar.gz/download
- ErrorCorrection.tgz, another correction tool for SOAPdenovo:
http://sourceforge.net/projects/soapdenovo2/files/ErrorCorrection/ErrorCorrection.tgz/download

- Correction tool http://soap.genomics.org.cn/down/correction.tar.gz
- SOAPdenovo http://soap.genomics.org.cn/down/SOAPdenovo-v1.04.tgz
- GapCloser http://soap.genomics.org.cn/down/GapCloser.tar.gz

更多Reads correction工具见：https://omictools.com/error-correction-category

Reads correction工具：Recommended programs:
– HiSeq data: BLESS, Musket, RACER and SGA.
– MiSeq data: RACER.
– Human data: Musket, RACER and SGA."
https://sourceforge.net/projects/musket/

其他类似工具：
- ECHO （http://uc-echo.sourceforge.net/）文献 http://genome.cshlp.org/content/21/7/1181.full
- CORAL （https://www.cs.helsinki.fi/u/lmsalmel/coral/）文献https://academic.oup.com/bioinformatics/article/27/11/1455/217071/Correcting-errors-in-short-reads-by-multiple
- Quake（http://www.cbcb.umd.edu/software/quake/index.html），文献http://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-11-r116
- Quake如何安装：https://www.plob.org/article/1635.html
- EC: an efficient error correction algorithm for short reads
- QuorUM: An Error Corrector for Illumina Reads.
For human data, the best tools are lighter and the latest bless. The old bless evaluated in the paper wasn't very good.
文献：https://academic.oup.com/bib/article/16/4/588/347932/Correcting-Illumina-data
（Reads error correction一般在trim之后进行。）

- Sprai（http://zombie.cb.k.u-tokyo.ac.jp/sprai/）Sprai (single-pass read accuracy improver) is a tool to correct sequencing errors in single-pass reads for de novo assembly. It is originally designed for correcting sequencing errors in single-molecule DNA sequencing reads, especially in Continuous Long Reads (CLRs) generated by PacBio RS sequencers.

## 4、基因组拼接（Assembly）

K-mer估计：
- velvetK（http://www.vicbioinformatics.com/software.velvetk.shtml）：用于计算最合适的Kmer
- KmerGenie（http://kmergenie.bx.psu.edu/）：estimates the best k-mer length for genome de novo assembly.

De novo拼接：
- velvet（http://www.ebi.ac.uk/~zerbino/velvet/）：适用于微生物基因组
- VelvetOptimiser（http://www.vicbioinformatics.com/software.velvetoptimiser.shtml）：批量多Kmer拼接
- SPAdes（http://bioinf.spbau.ru/spades）：Illumina、PacBio数据适用（支持gzip压缩的fastq文件），同样适用于宏基因组。但实际情况，不太适用于病毒。
- Shovill（https://github.com/tseemann/shovill）：Faster SPAdes assembly of Illumina reads。
- Minia（https://github.com/GATB/minia）
- Soapdenovo（http://soap.genomics.org.cn/soapdenovo.html 或者 https://github.com/aquaskyline/SOAPdenovo2）：华大开发的针对大基因组拼接
- ABySS（http://www.bcgsc.ca/platform/bioinfo/software/abyss）：基于De Bruijn Graph算法，适用于大基因组。
- ALLPATHS-LG（http://software.broadinstitute.org/allpaths-lg/blog/）：适合于组装short reads数据
ALLPATHS-LG的使用说明博客：http://blog.sciencenet.cn/blog-303373-717174.html
- Celera Assembler（目前不再维护）（http://wgs-assembler.sourceforge.net/wiki/index.php?title=Main_Page），（https://sourceforge.net/projects/wgs-assembler/）：Illumina、454、Pacbio等数据均适用。
- CABOG（http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Main_Page）：CABOG(Celera Assembler with Best Overlap Graph) is an extension of the Celera Assembler software。（不再维护）
- Canu（http://canu.readthedocs.io/en/stable/#）：PacBio RSII or Oxford Nanopore MinION数据适用 http://canu.readthedocs.io/en/latest/
- Platanus（http://platanus.bio.titech.ac.jp/?p=1）：专门为高杂合基因组组装设计的软件，同样适用于DNA Virus。
- MetaPlatanus（http://platanus.bio.titech.ac.jp/?page_id=174）：De novo assembly and sequence clustering of metagenomic data（宏基因组拼接）
- RepARK（https://github.com/PhKoch/RepARK）：de novo creation of repeat consensuses from whole-genome NGS reads
- RepARK的文献：https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4027187/
- Novoalign（http://www.novocraft.com/products/novoalign/）：mapping of short reads onto a reference genome
- Falcon（https://github.com/PacificBiosciences/FALCON）：基于String Graph算法，常用于PacBio diploid assembler。
- GAGE（http://gage.cbcb.umd.edu/index.html）
- Arachne & AllPath（https://www.broadinstitute.org/scientific-community/software）
- VISTA tools，包括AVID: （http://pipeline.lbl.gov/run5details.shtml）
- MIRA（https://sourceforge.net/p/mira-assembler/wiki/Home/）：a whole genome shotgun and EST sequence assembler for Sanger, 454, Solexa (Illumina), IonTorrent data and PacBio。
- gsAssembler/GS De Novo Assembler/runAssembly (command-line based) and gsMapper (command-line based)（http://www.454.com/products/analysis-software/）：针对454数据的拼接
- Newbler：是gsAssembler/GS De Novo Assembler的核心算法，已整合在GS De Novo Assembler
- MetaVelvet（http://metavelvet.dna.bio.keio.ac.jp/）：a short read assember for metagenomics
- MaSuRCA（ftp://ftp.genome.umd.edu/pub/MaSuRCA/）
怎么使用MaSuRCA拼接：https://www.plob.org/article/7853.html
- RAMPART（https://github.com/TGAC/RAMPART 或 http://www.earlham.ac.uk/rampart/）：a pipeline for de novo assembly of DNA sequence data.
- edena(http://www.genomic.ch/edena.php)
- cap3(http://seq.cs.iastate.edu/cap3.html)
- SHORTY（http://www3.cs.stonybrook.edu/~skiena/shorty/）：SHORTY用于组装ABI SOLiD产生的序列。目前也可用于Illumina数据，但须先转为fasta格式。
- Links：http://www.bcgsc.ca/platform/bioinfo/software/links
- SGA：https://github.com/jts/sga

- iCORN2(http://icorn.sourceforge.net/)：correct PacBio assemblies of Bacteria and Eukaryotes.
- FaBox：http://users-birc.au.dk/biopv/php/fabox/：an online fasta sequence toolbox，可转换格式、提取序列

结合reference genome指导拼接：
- IDBA（http://i.cs.hku.hk/~alse/hkubrg/projects/idba_hybrid/index.html）
- Chromosomer（https://github.com/gtamazian/Chromosomer）
Chromosomer文献：https://link.springer.com/article/10.1186/s13742-016-0141-6
- Scaffold_Builder（https://sourceforge.net/projects/scaffold-b/）：Combining de novo and reference-guided assembly with Scaffold_builder
文献：http://scfbm.biomedcentral.com/articles/10.1186/1751-0473-8-23
- AlignGraph（https://github.com/baoe/AlignGraph）
- Ragout（https://github.com/fenderglass/Ragout）
- SyMap（http://www.agcol.arizona.edu/software/symap/）：a turnkey synteny system with application to plant genomes，eukaryotic genomes 均适用。
- RACA（）
- AMOScmp（https://sourceforge.net/projects/amos/?source=directory）
- Medusa（https://github.com/combogenomics/medusa）
- CONTIGuator（http://contiguator.sourceforge.net/）
- Multi-CAR（http://140.114.85.168/Multi-CAR/index.php）
- refGuidedDeNovoAssembly_pipelines：https://bitbucket.org/HeidiLischer/refguideddenovoassembly_pipelines
参考文献：https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5681816/
\# refGuidedDeNovoAssembly_pipelines 更适合大型基因组（真核），需要多个文库、mate文库（大片段文库）。

Ordering contigs against a reference:
- Mauve（http://darlinglab.org/mauve/mauve.html） From the Tools menu, select ‘Move Contigs’.
- ABACAS（http://abacas.sourceforge.net/index.html）
示例：
perl abacas.1.3.1.pl -r ../../ref_data/NC_022082.fasta -q ../genomes/NJXKYY22.genome.fasta -p "nucmer" -i 70 -c -m -b -o test_sorted.fasta
更多使用说明：http://abacas.sourceforge.net/Manual.html

- GAP5（http://www.sanger.ac.uk/science/tools/gap5 或者https://sourceforge.net/projects/staden/）：Gap5 is a DNA sequence assembly visualiser and editing tool.
GAP5使用说明：file:///C:/myProgram/Staden%20Package/share/doc/staden/manual/gap5_toc.html

病毒组装(virus assembly):

- VirAmp（http://docs.viramp.com/en/latest/index.html）：a galaxy-based viral genome assembly pipeline
https://github.com/kdaily/viramp-project
http://viramp.readthedocs.io/en/latest/
VirAmp的文献：https://gigascience.biomedcentral.com/articles/10.1186/s13742-015-0060-y
- V-Fat（https://www.broadinstitute.org/viral-genomics/v-fat）：V-FAT is a tool to perform automated computational finishing and annotation of de novo viral assemblies. automated finishing, annotation, and QA tool for viral assemblies.
- Viral-ngs（http://viral-ngs.readthedocs.io/en/latest/index.html）：针对 rna 病毒
- IVA（https://github.com/sanger-pathogens/iva）：IVA is a de novo assembler designed to assemble virus genomes that have no repeat sequences, using Illumina read pairs sequenced from mixed populations at extremely high and variable depth.
- VIGA（https://github.com/EGTortuero/viga）：VIGA a sensitive precise and automatic de novo viral genome annotator。

其他与病毒相关的工具：
（1）Virus integration detection
- BSVF（https://github.com/BioInfoTools/BSVF）：Bisulfite Sequencing Virus integration Finder
- VirusFinder （https://bioinfo.uth.edu/VirusFinder/）
- VirusSeq（http://odin.mdacc.tmc.edu/%7Exsu1/VirusSeq.html）：detecting known viruses and their integration sites in the human genome using next-generation sequencing data.
- ViralFusionSeq (VFS)（https://sourceforge.net/projects/viralfusionseq/）：discovering viral integration events and reconstruct fusion transcripts at single-base resolution.
- Vy-PER （http://www.ikmb.uni-kiel.de/vy-per/ ）：Virus integration detection bY Paired End Reads
- seeksv（https://github.com/qiukunlong/seeksv）：an accurate tool for structural variation and virus integration detection.
（2）宏基因组数据相关的病毒
- VirMet（https://github.com/ozagordi/VirMet）：a set of tools for viral metagenomics
- VirFinder（https://github.com/jessieren/VirFinder）：R package for identifying viral sequences from metagenomic data using sequence signatures。
- METAVIR：http://metavir-meb.univ-bpclermont.fr/ METAVIR is a web server designed to annotate viral metagenomic sequences (raw reads or assembled contigs).
- haploclique（https://github.com/armintoepfer/haploclique）：病毒snp、indel检测

- Kronos（http://kronos.readthedocs.io/en/latest/）：A workflow assembler for cancer genome analytics and informatics.

更多的组装工具见：http://www.mybiosoftware.com/assembly-tools

组装出来的基因组草图的scaffold需要进一步进行gaps的关闭。进行这样功能的软件有：
- SOAPdenovo GapCloser (http://sourceforge.net/projects/soapdenovo2/files/GapCloser/)
- IMAGE（https://sourceforge.net/projects/image2/）：Iterative Mapping and Assembly for Gap Elimination。
- GapFiller （https://www.baseclear.com/services/bioinformatics/basetools/gapfiller/）
GapFiller使用说明博客：https://www.plob.org/article/6182.html
- 另外一个 GapFiller（https://sourceforge.net/projects/gapfiller/）
- FinIS（https://sourceforge.net/projects/finis/）
- FGAP（https://sourceforge.net/projects/fgap/）：利用BLAST将contigs序列比对到基因组草图序列上，寻找重叠到gap区间的最优序列，从而进行补洞。
FGAP的文献：https://www.researchgate.net/publication/263207973_FGAP_An_automated_gap_closing_tool 或者 http://bmcresnotes.biomedcentral.com/articles/10.1186/1756-0500-7-371
FGAP的使用博客：http://www.chenlianfu.com/?p=2333
- icorn（http://icorn.sourceforge.net/）：that enables errors in the consensus sequence to be corrected by iteratively mapping reads to the current assembly. （校正序列）

Bandage：https://rrwick.github.io/Bandage/ Assembly Graph Visualisation

微生物基因组流程相关软件：https://holtlab.net/2015/02/25/tools-for-bacterial-comparative-genomics/

对基因组错误评估

- REAPR(Recognition of Errors in Assemblies using Paired Reads)能利用成对的reads来识别基因组序列中的错误。从而，能将基因组序列从错误的gap处断开或将错误序列使用 Ns 代替。同时，对错误信息进行统计。
REAPR官网：http://www.sanger.ac.uk/science/tools/reapr
安装 REAPR 需要先安装 R 和 Perl 模块： File::Basename, File::Copy, File::Spec, File::Spec::Link, Getopt::Long, List::Util。
REAPR使用的博客：http://www.chenlianfu.com/?p=2329

- QUAST（http://bioinf.spbau.ru/quast 或者 http://quast.sourceforge.net/quast）：基因组装配质量评估工具
QUAST说明文档：http://quast.bioinf.spbau.ru/manual.html
- LASTZ（http://www.bx.psu.edu/~rsharris/lastz/）
- Miller Lab：http://www.bx.psu.edu/miller_lab/
- Mauve assembly metrics - （http://code.google.com/p/ngopt/wiki/How_To_Score_Genome_Assemblies_with_Mauve）
- InGAP-SV - （http://ingap.sourceforge.net/）：InGAP is also useful for finding structural variants between genomes from read mappings.

merge-gbk-records：https://github.com/kblin/merge-gbk-records：Merge multiple GenBank records using a defined spacer sequence

组装流程参考文档：http://vlsci.github.io/lscc_docs/tutorials/assembly/assembly-protocol/#section-2-assembly
http://onlinelibrary.wiley.com/doi/10.1111/eva.12178/full
https://en.wikipedia.org/wiki/Sequence_assembly

## 5、EST拼接
- iAssembler（http://bioinfo.bti.cornell.edu/tool/iAssembler/）：利用MIRA以及CAP3软件，将454以及sanger测序产生的转录组数据(EST)拼接成contigs。
相关文献：Yi Zheng , Liangjun Zhao , Junping Gao and Zhangjun Fei（2011）iAssembler: a package for de novo assembly of Roche-454/Sanger transcriptome sequences.

## 6、Alignment比对
- BLAST+（ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/）
- BLAT（http://genome.ucsc.edu/cgi-bin/hgBlat?command=start）
- clustalx/clustalw（http://www.clustal.org/）
clustalX是clustaw的图形化版本，前者在windows环境下使用，后者在DOS环境下是使用。
clustalw-format：http://web.mit.edu/meme_v4.9.0/doc/clustalw-format.html

更多软件：http://www.ebi.ac.uk/Tools/psa/

- MAFFT（Multiple Alignment using Fast Fourier Transform）（http://mafft.cbrc.jp/alignment/software/）
- MUSCLE（MUltiple Sequence Comparison by Log- Expectation）（http://www.drive5.com/muscle/）
- Mauve（http://darlinglab.org/mauve/mauve.html）
- Kalign（http://msa.sbc.su.se/cgi-bin/msa.cgi）
- T-Coffee（http://www.tcoffee.org/Projects/tcoffee/index.html）
- LAGAN & Shuffle-LAGAN（http://lagan.stanford.edu/lagan_web/index.shtml）
- MUGSY（http://mugsy.sourceforge.net）
- MUMmer（https://sourceforge.net/projects/mummer/）
- diamond（https://github.com/bbuchfink/diamond）
- amos（http://sourceforge.net/projects/amos/files/）：minimus2是amos拼接软件包里面的一个组件，它的功能就是将两组contig进行合并，延伸contig的长度，减少contig的数量。Amos是A Modular, Open-Source whole genome assembler的缩写，致力于打造成一个拼接软件的基础软件系统。minimus2用的是基于nucmer overlap检测的算法，速度上比Smith-Waterman hash-overlap的算法要快。更多说明：http://amos.sourceforge.net/wiki/index.php/AMOS
- circlator（http://sanger-pathogens.github.io/circlator/）：A tool to circularize genome assemblies
- ACT（Artemis Comparison Tool）（http://www.sanger.ac.uk/science/tools/artemis-comparison-tool-act）
- GMAP（http://research-pub.gene.com/gmap/ 或者 https://wiki.gacrc.uga.edu/wiki/Gmap-gsnap-Sapelo）：A Genomic Mapping and Alignment Program for mRNA and EST Sequences
- MSA（https://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/msa.html）
msa（http://www.bioconductor.org/packages/release/bioc/html/msa.html）：an R package for multiple sequence alignment。
- MSAProbs（https://sourceforge.net/projects/msaprobs/ 或者 http://msaprobs.sourceforge.net/homepage.htm#latest）
- PROBCONS（http://probcons.stanford.edu/index.html）
- Probalign（http://probalign.njit.edu/probalign/login）
- M-Coffee（http://www.tcoffee.org/Projects/mcoffee/）
- MergeAlign（http://www.stevekellylab.com/software/mergealign）

Muscle,ClustalW和T-coffee的简单比较：https://www.plob.org/article/4104.html
更多比对软件：https://en.wikipedia.org/wiki/List_of_sequence_alignment_software
http://www.ebi.ac.uk/Tools/msa/

多序列比对的格式：http://www.cnblogs.com/tsingke/p/3940074.html
多序列比对 wiki百科：https://en.wikipedia.org/wiki/Multiple_sequence_alignment
http://www.docin.com/p-812012331.html

全局比对工具
GASSST：http://www.irisa.fr/symbiose/projects/gassst/
示例：
Gassst -d tmp.fna -i gene_primer_out/Microcystis_aeruginosa.eryG_2.Microcystis_aeruginosa.eryG_2.p3_seqs.fa -o test.gassout -p 80 -m 8 -n 10

蛋白多序列比对转为核酸比对：
pal2nal：http://www.bork.embl.de/pal2nal/

## 7、Short Read Aligners（mapped）
- Bowtie（http://bowtie-bio.sourceforge.net/index.shtml）
- Bwa（http://bio-bwa.sourceforge.net）
- MAQ（http://maq.sourceforge.net/）
- subread（http://subread.sourceforge.net/）
- BBMap（https://sourceforge.net/projects/bbmap/）：BBMap short read aligner, and other bioinformatic tools.
- BBtools（http://jgi.doe.gov/data-and-tools/bbtools/）
BBmap的使用：http://seqanswers.com/forums/showthread.php?t=58221 和 http://seqanswers.com/forums/showthread.php?t=44494
- Stampy（http://www.well.ox.ac.uk/project-stampy）：快速、灵敏
- Stampy的文献：https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3106326/
- samblaster：（https://github.com/GregoryFaust/samblaster）a tool to mark duplicates and extract discordant and split reads from sam files.
- sambamba：（https://github.com/biod/sambamba 或者 http://lomereiter.github.io/sambamba/） Tools for working with SAM/BAM data. （推荐！）
- ELAND
- Novoalign
- SMALT(http://www.sanger.ac.uk/science/tools/smalt-0 或者 https://sourceforge.net/projects/smalt/) ：SMALT aligns DNA sequencing reads with genomic reference sequences.
- BEDTools（https://code.google.com/p/bedtools/）

## 8、SNP/indel calling
- Dindel（http://sites.google.com/site/keesalbers/soft/dindel）：小的插入/缺失发现
- Pindel（http://gmt.genome.wustl.edu/packages/pindel/）：小的插入/缺失发现
- Samtools（http://samtools.sourceforge.net 或者 http://www.htslib.org/）：mapping后数据分析的工具
- bamtools（https://github.com/pezmaster31/bamtools）
- GATK（https://software.broadinstitute.org/gatk/）
- bcftools（http://www.htslib.org/download/）
- VarScan（http://massgenomics.org/varscan 或者 http://dkoboldt.github.io/varscan/）
- scalpel（https://sourceforge.net/projects/scalpel/?source=directory）：Genetic variants discovery and detect indel
scalpel的文献：http://www.nature.com/nmeth/journal/v11/n10/full/nmeth.3069.html
使用方法参考：http://www.bio-info-trainee.com/2341.html
- INDELseek（https://github.com/tommyau/indelseek）：检测indel
- ScanIndel（https://github.com/cauyrd/ScanIndel）
- Snippy（https://github.com/tseemann/snippy）：bacterial SNP and indel calling
- Picard（http://broadinstitute.github.io/picard/ 或者https://github.com/broadinstitute/picard）：java程序
- SpeedSeq：（https://github.com/hall-lab/speedseq）由华盛顿大学医学院等机构的研究人员开发。它利用低成本的服务器，在短短的13小时内即可完成50x人类基因组的比对、变异检测和功能注释。这解决了目前WGS生物信息学的瓶颈。可应用于WGS、WES、panel测序数据。
SpeedSeq文献：http://www.nature.com/nmeth/journal/v12/n10/full/nmeth.3505.html
可参考：http://www.biotrainee.com/thread-338-1-1.html
- Sequence Variant Analyzer（http://www.svaproject.org）：在基因组背景下显示变异
- HugeSeq（https://github.com/StanfordBioinformatics/HugeSeq）：结构变异的pipeline
参考：http://blog.csdn.net/alex6plus7/article/details/50236375
- KvarQ（https://github.com/kvarq/kvarq）：Targeted and direct variant calling in FastQ reads of bacterial genomes。

- nesoni：https://github.com/Victorian-Bioinformatics-Consortium/nesoni a toolkit for NGS SNP calling / RNA-Seq DGE / read cleaning。
- RedDog：https://github.com/katholt/RedDog a workflow pipeline for short read length
sequencing analysis, including the read mapping task, through to variant
detection, followed by analyses (SNPs only).
Single nucleotide polymorphisms (SNPs) with Phred quality score ≥30 were
identified in each isolate using SAMTools.

## 9、SV、SNV
- LUMPY（https://github.com/arq5x/lumpy-sv）：a general probabilistic framework for structural variant discovery.
- MetaSV：（http://bioinform.github.io/metasv/）An accurate and integrative structural-variant caller.
- MetaSV文献：https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btv204
- FindSV：（https://github.com/dnil/FindSV）
- SomaticSniper（http://gmt.genome.wustl.edu/packages/somatic-sniper/ 或者 https://github.com/genome/somatic-sniper）：检测SNV

FindTranslocations,CNVnator and fermikit

SV、CNV
- SV-Autopilot（https://github.com/ALLBio/allbiotc2）
- GASV：http://compbio.cs.brown.edu/projects/gasv/ 或者https://github.com/ZhihaoXie/GASV_
GASV文档：https://vcru.wisc.edu/simonlab/bioinformatics/programs/gasv/GASV_UserGuide.pdf
- srGASV：https://github.com/dstorch/srGASV
- MultiBreak-SV：http://compbio.cs.brown.edu/projects/multibreaksv/ 或者 https://github.com/raphael-group/multibreak-sv
- SVDetect：https://sourceforge.net/projects/svdetect/
- PEMer：detecting SVs from paired-end reads. http://sv.gersteinlab.org/pemer/ 或者 https://github.com/BIGLabHYU/PEMer
- VariationHunter: An tool for identifying structural variations from paired-end WGS data. https://sourceforge.net/projects/variationhunter/
- vaquita：https://github.com/seqan/vaquita Identification of structural variations
\# 注意，vaquita需要的ref序列必须以 .fa 为后缀。
- svmerge：https://sourceforge.net/projects/svmerge/ A tool for SVs analysis by integrating calls from several existing SV callers.
- breakway：https://sourceforge.net/projects/breakway/ identification of genomic breakpoints
- CNT-MD：Copy-Number Tree Mixture Deconvolution http://compbio.cs.brown.edu/projects/cnt-md/ 或者 https://github.com/raphael-group/CNT-MD
- CNT-ILP: Copy-Number Tree http://compbio.cs.brown.edu/projects/cnt-ilp/ 或者https://github.com/raphael-group/CNT-ILP
- Whole Exome Sequencing Analysis Pipeline： http://metamoodics.org/wiki/index.php?title=Whole_Exome_Sequencing_Analysis_Pipeline
- BSseeker2（https://github.com/BSSeeker/BSseeker2）：A versatile aligning pipeline for bisulfite sequencing data.

更多工具见：http://www.knowgene.com/question/8855

相关工具：https://omictools.com/indel-detection-category

- PopSV：https://github.com/jmonlong/PopSV Human copy number variants detection

- Sniffles：https://github.com/fritzsedlazeck/Sniffles Sniffles is a structural variation caller using third generation sequencing (PacBio or Oxford Nanopore).
- NGMLR：https://github.com/philres/ngmlr NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations.

遗传变异软件综述：https://academic.oup.com/bib/article/15/2/256/210976/A-survey-of-tools-for-variant-analysis-of-next
一些软件工具列表：http://seqanswers.com/forums/showthread.php?t=43

## 10、Chip-Seq

- Findpeaks（http://vancouvershortr.sourceforge.net）

## 11、RNA-Seq
- Cufflinks（http://cufflinks.cbcb.umd.edu）：测定转录本丰度
- Tophat（http://ccb.jhu.edu/software/tophat/index.shtml）：剪接点定位
- Trinity （https://github.com/trinityrnaseq/trinityrnaseq/wiki）
- Oases（http://www.ebi.ac.uk/~zerbino/oases/）：根据转录组数据拼接
- Trans-ABySS（http://www.bcgsc.ca/platform/bioinfo/software/trans-abyss）：转录组拼接
- HISAT（http://ccb.jhu.edu/software/hisat/index.shtml）：转录组差异表达分析
- StringTie（http://ccb.jhu.edu/software/stringtie/）：组装转录本并预计表达水平
- Ballgown（https://github.com/alyssafrazee/ballgown）：RNA-seq的差异表达分析
拓展阅读：利用tophat和Cufflinks做转录组差异表达分析的步骤详解
更多rna方面的软件：http://www.mybiosoftware.com/rna-analysis

## 12、Genome visualisers and editors
- Integrated Genome Browser（http://www.bioviz.org/igb/）
- Integrative Genomics Viewer（http://www.broadinstitute.org/software/igv/）
- Artemis（http://www.sanger.ac.uk/science/tools/artemis）
- CLC BioWorkbench（https://www.qiagenbioinformatics.com/products/clc-genomics-workbench/）
- Geneious（http://www.geneious.com/）http://www.geneious.com/features/assembly-mapping
- IGV （www.broadinstitute.org/igv/）

## 13、绘图
- hemi（http://hemi.biocuckoo.org/index.php）：图形化绘制heatmap
- clusterProfiler: https://github.com/GuangchuangYu/clusterProfiler：statistical analysis and visualization of functional profiles for genes and gene clusters

## 14、圈图
- circos（http://circos.ca）
- BioCircos：http://bioinfo.ibp.ac.cn/biocircos/index.php
- BRIG（http://brig.sourceforge.net/）
文档：http://brig.sourceforge.net/brig-tutorial-1-whole-genome-comparisons/
https://sourceforge.net/projects/brig/files/
- OGDRAW（http://ogdraw.mpimp-golm.mpg.de/index.shtml）：细胞器基因组圈图的绘制
- DNAPlotter（http://www.sanger.ac.uk/science/tools/dnaplotter）

## 15、编码基因预测
- Glimmer（http://ccb.jhu.edu/software/glimmer/index.shtml）：针对细菌、古菌、病毒的基因预测
- GeneMarkS（http://topaz.gatech.edu/GeneMark/）：细菌、古菌、病毒、噬菌体、病毒和转录组的基因预测
- MetaGeneMark：Genemark的一个针对metagenome的预测软件
- Prodigal（http://prodigal.ornl.gov/）：针对原核生物的基因预测（高GC可用），metaGenome也适用，但不适用与RNA gene and viral gene预测。
- MetaGene Annotator（MetaGeneAnnotator）（http://metagene.cb.k.u-tokyo.ac.jp/）：a gene-finding program for prokaryote and phage. metaGenome也适用。
- FragGeneScan（https://github.com/COL-IU/FragGeneScan.git）：It can be applied to predict prokaryotic genes in incomplete assemblies or complete genomes.
- Orphelia（http://orphelia.gobics.de/）：Orphelia is a metagenomic ORF finding tool for the prediction of protein coding genes in short, environmental DNA sequences with unknown phylogenetic origin。
- GenScan（http://genes.mit.edu/GENSCAN.html）：脊椎动物、拟南芥和玉米的基因预测工具
- Pfam_Scan（http://pfam.xfam.org/）：蛋白结构域的预测
PfamScan工具（ftp://ftp.ebi.ac.uk/pub/databases/Pfam/Tools/）
- tRNAscan-SE（http://lowelab.ucsc.edu/tRNAscan-SE/）：tRNA预测
- ARAGORN：http://130.235.46.10/ARAGORN/ 或者 http://mbio-serv2.mbioekol.lu.se/ARAGORN/Downloads/ ARAGORN detects tRNA, mtRNA, and tmRNA genes.
- RNAmmer（http://www.cbs.dtu.dk/services/RNAmmer/）：rRNA预测
- Barrnap（http://www.vicbioinformatics.com/software.barrnap.shtml 或者 https://github.com/tseemann/barrnap）：rRNA预测识别
- snoGPS（http://lowelab.ucsc.edu/snoGPS/）：Search for H/ACA snoRNA genes in a genomic sequence
- Snoscan（http://lowelab.ucsc.edu/snoscan/）：Search for C/D box methylation guide snoRNA genes in a genomic sequence

- OrfM：https://github.com/wwood/OrfM simple and not slow ORF caller。
- getorf：http://emboss.sourceforge.net/apps/cvs/emboss/apps/getorf.html Find and extract open reading frames (ORFs).
- checktrans：http://emboss.open-bio.org/rel/rel6/apps/checktrans.html Reports STOP codons and ORF statistics of a protein.
- plotorf：http://emboss.sourceforge.net/apps/release/6.0/emboss/apps/plotorf.html Plot potential open reading frames in a nucleotide sequence.
- ORFfinder：ftp://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/ORFfinder/linux-i64/
ORF Finder（online工具）：http://www.bioinformatics.org/sms2/orf_find.html
- AntiFam：ftp://ftp.ebi.ac.uk/pub/databases/Pfam/AntiFam/ 识别假的ORF
AntiFam的文章：https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308159/
http://xfam.org/
如何执行AntiFam?
hmmsearch --domtblout test_vs_antifam.out --tblout test_vs_antifam.out2 --domE 1e-10 --cpu 12 ../AntiFam.hmm test.faa

## 16、注释流程（pipeline）软件
- Manatee（http://manatee.sourceforge.net/igs/index.shtml）：Manatee is a web-based tool used to perform manual functional annotation.
- Ergatis（http://ergatis.sourceforge.net/index.html）、（https://sourceforge.net/projects/ergatis/）
- RAST（http://www.nmpdr.org/FIG/wiki/view.cgi/FIG/RapidAnnotationServer 或者 http://rast.nmpdr.org/）：annotating bacterial and archaeal genomes（在线）
- prokka（http://www.vicbioinformatics.com/software.prokka.shtml）：针对原核的注释
- Annotationtools：https://github.com/rbotts/Annotationtools Python script for annotating sequences from fasta file (Bacterial). Uses GeneMarkS and BioPython. （针对原核生物）

- RATT(Rapid Annotation Transfer Tool)http://ratt.sourceforge.net/：基于参考基因组进行快速基因功能注释。RATT is not now part of PAGIT.
- PAGIT（http://www.sanger.ac.uk/science/tools/pagit）（Post Assembly Genome Improvement Toolkit）.

## 17、组装后基本数据统计
- assembly-stats（https://github.com/sanger-pathogens/assembly-stats）
- assembly-stats（https://github.com/rjchallis/assembly-stats）
- assemblyStatics（https://github.com/WenchaoLin/assemblyStatics）
- velvet-stats（https://github.com/ajmazurie/velvet-stats）
- gstawk（https://github.com/mspopgen/gstawk）
- seqStats(https://github.com/peteashton/seqStats):Two figures are produced: one contains the length distribution histogram and a cumulative length plot, the other plots GC vs sequence length.

- TBtools(https://github.com/CJ-Chen/TBtools)

## 18、Kmer分析基因大小评估
- GCE（ftp://ftp.genomics.org.cn/pub/gce/）：是华大基因用于基因组评估的软件
- GCE的文献：https://www.researchgate.net/publication/255722390_Estimation_of_genomic_characteristics_by_analyzing_k-mer_frequency_in_de_novo_genome_projects
使用说明博客：https://www.plob.org/article/9388.html
- KmerGenie（http://kmergenie.bx.psu.edu/）
- Jellyfish （http://www.genome.umd.edu/jellyfish.html）
Jellyfish的用法说明：http://www.chenlianfu.com/?p=806
- KmerFreq

## 19、外显子组相关的软件
- CNV检测的软件：CoNIFER（http://conifer.sourceforge.net/）
- SNP注释软件：annovar（http://annovar.openbioinformatics.org/en/latest/）

## 20、GO注释
- blast2go（https://www.blast2go.com/）
- GO_Annotation_Plot （https://github.com/ZhihaoXie/GO_Annotation_Plot.git）

## 21、比较基因组学
- Sibelia: A comparative genomics tool（http://bioinf.spbau.ru/en/sibelia）

## 22、进化树
- Pairdist（https://github.com/frederic-mahe/pairdist）：用于建NJ树
- TreeBest（https://github.com/lh3/treebest 或者 http://treesoft.sourceforge.net/）
- TreeBest的使用：http://blog.sina.com.cn/s/blog_620b35790100mcp6.html
- Fasttree（http://www.microbesonline.org/fasttree/）
- RAxML（https://sco.h-its.org/exelixis/web/software/raxml/index.html）：ML树工具
- PhyML（http://www.atgc-montpellier.fr/phyml/）：在线构建ML树的工具，也可以本地执行
- profileNJ（https://github.com/maclandrol/profileNJ）：使用物种数和NJ树校正Gene tree
- Figtree（http://tree.bio.ed.ac.uk/software/figtree/）：a graphical viewer of phylogenetic trees and as a program for producing publication-ready figures.
- Dendroscope（http://dendroscope.org/）：Software for visualizing phylogenetic trees and rooted networks.
- PATRIC（https://www.patricbrc.org/）：Phylogenetic Tree Builder

- TempEst（http://tree.bio.ed.ac.uk/software/tempest/）TempEst is a tool for investigating the temporal signal and 'clocklikeness' of molecular phylogenies.

- liftover（http://hgdownload.cse.ucsc.edu/admin/exe/）：用于基因组版本坐标转换（http://genome.ucsc.edu/）
参考：http://www.plob.org/article/9541.html

- splign是NCBI中一个比对cDNA和genome的一个工具，通过splign可以很方便的找到cDNA各个外显子。
参考：http://www.plob.org/article/7361.html

## 23、宏基因组

（1）宏基因组拼接工具

可用的拼接的工具：SOAPdenovo、SPAdes、IDBA、MetaPlatanus、ABySS、CABOG
- TruSPAdes（http://cab.spbu.ru/software/spades/）：用于宏基因组的拼接
- MEGAHIT（https://github.com/voutcn/megahit）
- Ray（https://github.com/sebhtml/ray 或者 http://denovoassembler.sourceforge.net/）：a de novo assembler using MPI 2.2. Ray Meta: scalable de novo metagenome assembly and profiling.
- Meraga（）
- Minia （http://minia.genouest.org/）
- MetaVelvet（http://metavelvet.dna.bio.keio.ac.jp/）：a short read assember for metagenomics
可参考：http://blog.sina.com.cn/s/blog_670445240101lg2a.html
- MetAMOS（https://github.com/marbl/metAMOS）：A metagenomic and isolate assembly and analysis pipeline built with AMOS。
- Subtractive Assembly（https://sourceforge.net/projects/subtractive-assembly/）：通过拼接来比较宏基因组间的差异。主要目的是降低宏基因组的拼接成本，着眼于发现差异物种和差异基因，先基于原始的reads挑选具有差异kmer的reads，然后将挑选出来的reads进行拼接。
可参考：http://blog.sina.com.cn/s/blog_83f77c940102vvwr.html

（2）其他

- MG-RAST（http://metagenomics.anl.gov/） http://evomics.org/learning/genomics/metagenomics/mg-rast/
- GOTTCHA（https://github.com/LANL-Bioinformatics/GOTTCHA）
- MIDAS（https://github.com/snayfach/MIDAS）：Metagenomic Intra-Species Diversity Analysis System。Our reference database of bacterial species and associated genomic data resources are available at http://lighthouse.ucsf.edu/MIDAS。
- MIDAS的文献：https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5088602/
- checkM（https://github.com/Ecogenomics/CheckM）

（3）taxonomic 物种分类
- Kraken（http://ccb.jhu.edu/software/kraken/）
- Kaiju（http://kaiju.binf.ku.dk/ 或者 https://github.com/bioinformatics-centre/kaiju）：Kaiju is a program for sensitive taxonomic classification of high-throughput sequencing reads from metagenomic whole genome sequencing or metatranscriptomics experiments.
- sourmash （pip install -U https://github.com/dib-lab/sourmash/archive/master.zip）
- MetaPhlAn2（http://segatalab.cibio.unitn.it/tools/metaphlan2/ 或者 https://bitbucket.org/biobakery/metaphlan2/src/default/）
- mOTU（http://www.bork.embl.de/software/mOTU/）
- PanPhlAn（http://segatalab.cibio.unitn.it/tools/panphlan/）
- ConStrains（https://bitbucket.org/luo-chengwei/constrains）：reads 数据作为输入
文献：http://www.nature.com/nbt/journal/v33/n10/full/nbt.3319.html
- Krona（https://github.com/marbl/Krona/wiki）：Taxonomy展示

（4）binning
- metaBAT：https://bitbucket.org/berkeleylab/metabat
- ESOM：http://databionic-esom.sourceforge.net/
- ESOM：https://sourceforge.net/projects/databionic-esom/?source=directory
- CheckM：http://ecogenomics.github.io/CheckM/ 或者 https://github.com/Ecogenomics/CheckM/releases
- MetaCluster：http://i.cs.hku.hk/~alse/MetaCluster/
- MetaBin：http://metabin.riken.jp/

（5）其他一些工具
- tetramerFreqs/Binning：https://github.com/tetramerFreqs/Binning
- Hawth's Analysis Tools for ArcGIS：http://www.spatialecology.com/htools/overview.php

其他：
http://www.360doc.com/content/16/0815/17/35684706_583419969.shtml

微生物生态研究中常用数据库简介：http://www.cnblogs.com/nkwy2012/p/6396435.html

参考：
http://msb.embopress.org/content/9/1/666 （一篇综述）
http://www.ebiotrade.com/newsf/2014-8/2014814163301250.htm

TaxonKit：https://bioinf.shenwei.me/taxonkit/ Efficient NCBI Taxonomy Toolkit

## 24、16S微生物多样性
- UNBIAS
- Vseach
- usearch
- NINJA

- SRA Toolkit：https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=software
http://ncbi.github.io/sra-tools/
https://github.com/ncbi/sra-tools
如何用fastq-dump把sra格式转成fastq格式(fq格式)：http://www.cnblogs.com/emanlee/archive/2013/04/15/3022328.html

## 25、基因家族预测
- GFam（http://www.paccanarolab.org/gfam）：GFam is a command-line tool for automatic annotation of gene families.

## 26、全长转录本
- SQANTI（https://bitbucket.org/ConesaLab/sqanti）：全长转录组测序新转录结构发现注释工具
http://www.ngsgo.com/biology/1436.html

## 27、COG注释
- eggNOG-mapper（http://eggnogdb.embl.de/#/app/emapper）

参考：http://diyitui.com/content-1466484195.47288872.html

- ASpipe（https://sourceforge.net/projects/aspipe/）：ASpipe is a pipeline to process GeneSeqer/GMAP alignments and identify alternative splicing (AS) events from the alignments. It requires unix bash, perl 5.0+ with DBI module and MySQL5.0+ to run properly.

## 28、基因组浏览器
- UCSC Genome Browser
http://genome.ucsc.edu

- Ensembl Genome Browser
http://www.ensembl.org

- NCBI Genome Browser
http://www.ncbi.nlm.nih.gov/mapview

- GMOD GBrowser
http://gmod.org

- UTGB
http://utgenome.org/

- IGV (Broad)
http://www.broadinstitute.org/igv/

- JBrowser (javascript)
http://jbrowse.org/

- Argo Genome Browser (Broad)
http://www.broadinstitute.org/annotation/argo/

- DNAnexus
https://dnanexus.com/genomes/hg18/public_browse

- Gaggle Genome Browser
http://gaggle.systemsbiology.net/docs/geese/genomebrowser/

- Celera Genome Browser
http://sourceforge.net/projects/celeragb/files/

- Apollo Genome Annotation Curation Tool
http://apollo.berkeleybop.org/current/index.html

参考：http://www.dxy.cn/bbs/thread/1385361#1385361
Map viewer的使用指南：http://www.dxy.cn/bbs/thread/1385361#1385361

NCBI使用 build 36这样的版本号；而ucsc等使用诸如human genome的hg18,hg19这样的版本号；ensembl呢，有自己的release版本，但是数据采用NCBI的编号。
两种风格的版本号有对应关系，比如human genome： hg19 = GRCh37，或者Build 38 patch release 7对应 GRCh38.p7。

其他工具：
- HUMAnN2（https://bitbucket.org/biobakery/humann2/wiki/Home）：

## 29、pan-genomes analysis
- Roary（http://sanger-pathogens.github.io/Roary/）：rapid large-scale prokaryote pan genome analysis
Roary文献：https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4817141/
- BPGA（http://www.iicb.res.in/bpga/index.html 或者 https://sourceforge.net/projects/bpgatool/）
BPGA is an ultra-fast software package that provides comprehensive pan genome analysis of microorganisms.（仅针对原核）
BPGA文献：https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4829868/pdf/srep24373.pdf
- PanGP （https://pangp.ybzhao.com/）PanGP is a tool for quickly analyzing bacterial pan-genome profile.（泛基因组特征分析、特征曲线）
- panOCT（https://sourceforge.net/projects/panoct/?source=directory）
- panOCT文献：https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3526259/
- LS-BSR（https://github.com/jasonsahl/LS-BSR）
- BSR（http://bsr.igs.umaryland.edu/）
LS-BSR文献：https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3976120/
- PGAP：pan-genomes analysis pipeline. （原核生物泛基因组学分析的自动化软件）
https://github.com/kastman/pgap-docker
https://sourceforge.net/projects/pgap/
PGAP文献：https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3268234/
\# PGAP 太耗时了！！慎用！
- metaPGAP（https://github.com/mitul-patel/metaPGAP）：metagenomic Pan Genome Analysis Pipeline
- AGAPE（https://github.com/yeastgenome/AGAPE）：针对酵母的pan-genome analysis

- Parsnp（http://harvest.readthedocs.io/en/latest/content/parsnp.html 或者 https://github.com/marbl/parsnp） Rapid core genome multi-alignment.(bacterial genomes )
Parsnp的文章：https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4262987/

- PGAP-X：https://pgapx.ybzhao.com/ PGAP-X is a microbial comparative genomic analysis platform with graphic interface.（比较基因组分析图形化接口）

## 30、转座子
- LTR_retriever（https://github.com/oushujun/LTR_retriever）：识别LTR retrotransposons

## 31、抗性基因和毒力因子
工具：
- abricate（https://github.com/tseemann/abricate）：Mass screening of contigs for antimicrobial and virulence genes
- ARIBA：https://github.com/sanger-pathogens/ariba 抗性基因检测（fastq序列作为输入）
- SRST2：https://github.com/katholt/srst2 或者 http://katholt.github.io/srst2/
- c-SSTAR：https://github.com/chrisgulvik/c-SSTAR
- ARGs-OAP：https://github.com/biofuture/Ublastx_stageone 和 http://smile.hku.hk/SARGs
ARGs-OAP的文献：https://academic.oup.com/bioinformatics/article/32/15/2346/1743463
\# 注意，ARGs-OAP的输入文件为fastq

- Meta-MARC：https://github.com/lakinsm/meta-marc 宏基因的耐药性基因检测
- DeepARG：http://bench.cs.vt.edu/deeparg 一种从宏基因组学数据中预测抗生素耐药性基因的深度学习方法。
DeepARG文献：https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-018-0401-z

Antimicrobial Resistance Gene Database：
- ARDB： http://ardb.cbcb.umd.edu/index.html
- BacMet （http://bacmet.biomedicine.gu.se/）: Antibacterial biocide and metal resistance genes database
\# BacMet 有配套检索注释工具，其执行如：
perl /sdg/database/BacMet_v1.1/BacMet-Scan_v1.1.pl -i ./final.scaffold.fa -o E6.3 -d /sdg/database/BacMet_v1.1/BacMet_EXP.704 -blast -e 0.00001 -cpu 10 -columns all -p 20 -table -report -counts -v
- CARD：https://card.mcmaster.ca/
- Resfams：http://www.dantaslab.org/resfams
- NCBI Bacterial Antimicrobial Resistance Reference Gene Database：https://www.ncbi.nlm.nih.gov/bioproject/PRJNA313047
- ARG-ANNOT：http://en.mediterranee-infection.com/article.php?laref=283%26titre=arg-annot
- ResFinder：https://cge.cbs.dtu.dk/services/ResFinder/ ResFinder identifies acquired antimicrobial resistance genes and/or find chromosomal mutations in total or partial sequenced isolates of bacteria.
ResFinder：https://bitbucket.org/genomicepidemiology/resfinder
- EcOH：https://github.com/katholt/srst2/tree/master/data

## 32、质粒序列检测
- PlasmidFinder：https://cge.cbs.dtu.dk/services/PlasmidFinder/ PlasmidFinder identifies plasmids in total or partial sequenced isolates of bacteria. PlasmidFinder, which searches for matches in a replicon database, had the highest precision (1.0) but was restricted by the contents of its database and the contig length obtained from de novo assembly (recall = 0.33).
PlasmidFinder数据库下载链接：https://cge.cbs.dtu.dk//services/data.php
- cBAR（http://csbl.bmb.uga.edu/~ffzhou/cBar/） recall and precision of 0.77 and 0.63.
- Recycler（https://github.com/Shamir-Lab/Recycler） It correctly predicted small plasmids but failed with long plasmids (recall = 0.12, precision = 0.28).
- PlasmidSPAdes（http://spades.bioinf.spbau.ru/plasmidSPAdes/）
- PLACNET（https://sourceforge.net/projects/placnet/）
- PLACNET2FASTA（https://github.com/tomdeman-bio/PLACNET2FASTA）：Converts PLACNET output to a FASTA file containing plasmid contigs

## 33、微生物
- Nullarbor：https://github.com/tseemann/nullarbor Pipeline to generate complete public health microbiology reports from sequenced isolates.

## 34、
Genome-to-Genome Distance Calculator (GGDC)：http://ggdc.dsmz.de/distcalc2.php 计算calculated DNA–DNA hybridization (DDH) value。

## 35、
- MinCED：https://github.com/ctSkennerton/minced CRISPRs检测
- CRT：http://www.room220.com/crt/ CRISPR Recognition Tool

## 36、
- Piggy（https://github.com/harry-thorpe/piggy）：Pipeline for analysing intergenic regions in bacteria

## 37、IS
- ISMapper（https://github.com/jhawkey/IS_mapper）ISMapper finds locations of an IS query in short read data using a series of mapping steps.

## 38、
- ncbi-genome-download（https://github.com/kblin/ncbi-genome-download）：Scripts to download genomes from the NCBI FTP servers。
示例：
~/.pyenv/versions/3.5.2/bin/ncbi-genome-download -F fasta -g Vibrio -o Vibrio_genomes -p 16 -r 15 bacteria

## 39、引物设计
- PrimerMapper：https://github.com/dohalloran/PrimerMapper
- primer3（https://github.com/primer3-org/primer3）
- PrimerView（https://github.com/dohalloran/PrimerView）

## 40、
- 蛋白功能注释分析的一些工具：https://classes.soe.ucsc.edu/bme225/Fall07/BME225.serverlist.html
https://classes.soe.ucsc.edu/bme225/Fall08/BME225.serverlist08.html

## 41、
- GWDSR：https://github.com/tigerxu/GWDSR

- COV2HTML：https://mmonot.eu/COV2HTML/connexion.php A Visualization and Analysis Tool of Bacterial Next Generation Sequencing (NGS) Data.

## 42、甲基化
- Bismark（https://www.bioinformatics.babraham.ac.uk/projects/bismark/）：A tool to map bisulfite converted sequence reads and determine cytosine methylation states. (鉴定甲基化)

- seqtools：http://www.sanger.ac.uk/science/tools/seqtools The SeqTools package contains three tools for visualising sequence alignments: Blixem, Dotter and Belvu.

## 43、转座因子
- CLARI-TE：https://github.com/jdaron/CLARI-TE Predicts Transposable Elements (TEs) in complexe genome such as wheat（小麦）.

## 44、重复序列分析
- TRF：http://tandem.bu.edu/trf/trf.download.html
- Msatfinder：http://www.bioinformatics.org/project/?group_id=469 https://github.com/knirirr/Msatfinder Msatfinder is a simple Perl script that detects perfect microsatellite repeats (1-6 bp) in nucleic acid or protein sequences.
- MISA - MIcroSAtellite identification tool：http://pgrc.ipk-gatersleben.de/misa/
- msatcommander：http://www.softpedia.com/get/Science-CAD/msatcommander.shtml （windows平台）

拓展：

（1）SSR/STR分型

解决方法如下：

1.首先要确定研究的物种是什么？有很多物种是已经有文献发表的SSR序列，同时又对应的引物序列供参考。这种的比较简单，不用自己设计引物。但尽量选择文献报道，比较多的多态性好的位点。比如：大豆的SSR位点，对应的引物序列也有，但文献一般发表的位点有哪些，哪些位点做了很多研究，多态性比较好，尽量选择这样的位点。

2.所研究的物种，没有文献报道。这样的话，比较麻烦，需要自己开发SSR引物。首先，你要从该物种的基因组序列中，筛选STR位点。具体方法有很多，比较：富集文库的方法，SSR-Hunter软件，等，有很多SSR引物开发的方法和资料。从基因组序列上选择来讲，尽量选择不连锁的位点。筛选出重复序列的位点后，要对位点的多态性检测。最终筛出的位点：不连锁、多态性好、易扩增。

3.ABI3730上，最终上机是检测荧光信号，引物5‘端荧光标记，这个检测量和速度很快，成本高，只有筛好引物，后续批量实验时，再上机。前期引物筛选，还是用普通引物（不带标记），跑PAGE胶，取20个左右样本，大概看下扩增片段，多态性，即可。

首先你要有序列，不知你做的是什么物种。把这些序列输入到在线的：http://www.genomics.ceh.ac.uk/cgi-bin/msatfinder/msatfinder.cgi 网站中，确定微卫星所在的位置；然后在微卫星序列两翼设计引物。

## 45、viewer
- SnapGene：http://www.snapgene.com/products/file_compatibility/GenBank/

## 46、pfam工具
- Pfam_Scan（http://pfam.xfam.org/）：蛋白结构域的预测
- PfamScan工具（ftp://ftp.ebi.ac.uk/pub/databases/Pfam/Tools/）
- InterProscan官网：
http://www.ebi.ac.uk/interpro/
http://www.ebi.ac.uk/interpro/interproscan.html

- AntiFam：ftp://ftp.ebi.ac.uk/pub/databases/Pfam/AntiFam/ 识别假的ORF
AntiFam的文章：https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308159/
http://xfam.org/
如何执行AntiFam?
hmmsearch --domtblout test_vs_antifam.out --tblout test_vs_antifam.out2 --domE 1e-10 --cpu 12 ../AntiFam.hmm test.faa

- wKinMut-2：http://kinmut2.bioinfo.cnio.es/KinMut2 wKinMut-2 is an integrated framework for the analysis and interpretation of the consequences of variants in the human kinome.
- GOTaxExplorer：http://gotax.bioinf.mpi-inf.mpg.de/ GOTaxExplorer presents a new approach to comparative genomics that integrates functional information and families with the taxonomic classification.

## 47、其他工具
- PathSeq：用PathSeq进行跨物种污染识别
https://software.broadinstitute.org/gatk/blog?id=23205
ftp://ftp.broadinstitute.org/bundle/pathseq/

## 48、基因结构分析
- GSDS：http://gsds.cbi.pku.edu.cn/

## 49、数据库
果蝇数据库：http://flybase.org/

酵母数据库：https://www.yeastgenome.org/

下载酵母数据：https://www.yeastgenome.org/download-data

## 50、一些说明（小技巧）
适合于NGS数据的基因组组装软件
1. ALLPATHS-LG
2. Velvet
3. SOAPdenovo
4. Bambus2
5. CABOG
6. MSR-CA
7. SGA
8. VCAKE
9. SHARCGS
10. SSAKE
11. Euler

适合Sanger数据的基因组组装软件
1. Newbler
2. Celera
3. CABOG
4. Edena
5. Shorty

组装的算法：

A）overlap/layout/Consensus(OLC)methods (rely on an overlap graph)

软件有：CABOG 、Newbler、Shorty、Edena

B)De Bruijn Graph(DBG) methods(use some form of K-mer graph)\

软件:SOAPdenovo、Euler、Velvet

C)Greey graph alogorithms（use OLC or DBG）

软件：SSAKE、SHARCGS、VCAKE

## 51、文献检索、下载

（1）Library Genesis
1. http://gen.lib.rus.ec（该网址速度比较快）
2. http://libgen.io（该网址速度较慢）
3. http://libgen.io/scimag/（该网址主要用于检索文章）

（2）Sci-hub
- http://tool.yovisun.com/scihub/
- http://sci-hub.tw/
- https://sci-hub.shop/
- https://sci-hub.org.cn/

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ZhihaoXie/awesome-bioinformatics-tools

Awesome Lists containing this project

README