Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/TimoLassmann/samstat
SAMStat displays various properties of next-generation sequencing reads stored in SAM/BAM format.
https://github.com/TimoLassmann/samstat
bioinformatics next-generation-sequencing quality-control
Last synced: 3 months ago
JSON representation
SAMStat displays various properties of next-generation sequencing reads stored in SAM/BAM format.
- Host: GitHub
- URL: https://github.com/TimoLassmann/samstat
- Owner: TimoLassmann
- License: other
- Created: 2017-12-11T07:07:01.000Z (almost 7 years ago)
- Default Branch: main
- Last Pushed: 2023-08-03T07:35:17.000Z (over 1 year ago)
- Last Synced: 2024-05-09T08:34:23.780Z (6 months ago)
- Topics: bioinformatics, next-generation-sequencing, quality-control
- Language: C
- Homepage:
- Size: 6.67 MB
- Stars: 23
- Watchers: 3
- Forks: 7
- Open Issues: 4
-
Metadata Files:
- Readme: Readme.org
- Changelog: ChangeLog
- License: COPYING
Awesome Lists containing this project
- Awesome-Bioinformatics - SAMstat - Displaying sequence statistics for next-generation sequencing. [ [paper-2010](https://academic.oup.com/bioinformatics/article/27/1/130/201972) | [web](http://samstat.sourceforge.net) ] (Next Generation Sequencing / BAM File Utilities)
README
[[https://github.com/TimoLassmann/samstat/actions/workflows/cmake.yml][https://github.com/TimoLassmann/samstat/actions/workflows/cmake.yml/badge.svg]]
* SAMStatSAMStat is an efficient C program to quickly display statistics of large
sequence files from next generation sequencing projects. When applied to SAM/BAM
files all statistics are reported for unmapped, poorly and accurately mapped
reads separately. This allows for identification of a variety of problems, such
as remaining linker and adaptor sequences, causing poor mapping. Apart from this
SAMStat can be used to verify individual processing steps in large analysis
pipelines.SAMStat reports length distribution, base quality distribution, mapping
statistics, mismatch, insertion and deletion error profiles. The output is a
single html page:[[Image of example output][https://user-images.githubusercontent.com/8110320/175869206-6edcb06d-1afc-42f6-bbb8-16a2a18146f0.png]]
* How to install
*SAMstat* depends on the hdf5 library. To install on linux:
Ubuntu/Debian:
#+begin_src bash :eval never
sudo apt-get install -y libhdf5-dev
#+end_srcOn a mac via [[https://brew.sh][brew]]:
#+begin_src bash :eval never
brew install hdf5
#+end_srcTo build *SAMstat*:
#+begin_src bash :eval never
git clone https://github.com/TimoLassmann/samstat.git
cd samstat
mkdir build
cd build
cmake ..
make
make install
#+end_src* Usage
#+begin_src bash :eval never
samstat ...
#+end_srcFor each input file SAMStat will create a single html page named after the input file name plus a dot =samstat.html= suffix.
Available options:
#+begin_src bash :eval never
-d/-dir : Output directory. []
NOTE: by default SAMStat will place reports in the same directory as the input files.
-p/-peek : Report stats only on the first sequences. [unlimited]
-t : Number of threads. [4]
will only be used when multiple input files are present.
--plotend : Add base and quality plots relative to the read ends. []
--seed : Random number seed. [0]
--verbose : Enables verbose output. []-h/-help : Prints help message. []
-v/-version : Prints version information. []
#+end_src* Please cite:
Timo Lassmann (2023) "SAMStat 2: quality control for next generation sequencing data." Bioinformatics. (2023): btad019, https://doi.org/10.1093/bioinformatics/btad019Lassmann et al. (2010) "SAMStat: monitoring biases in next generation sequencing data." Bioinformatics 27.1 (2011): 130-131. [[https://doi.org/10.1093%2Fbioinformatics%2Fbtq614][doi:10.1093/bioinformatics/btq614]]