Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mbhall88/nohuman
Remove human reads from a sequencing run
https://github.com/mbhall88/nohuman
bioinformatics contamination contamination-removal fastq human-contamination human-read-removal
Last synced: 1 day ago
JSON representation
Remove human reads from a sequencing run
- Host: GitHub
- URL: https://github.com/mbhall88/nohuman
- Owner: mbhall88
- License: mit
- Created: 2023-11-21T06:39:11.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-01T11:44:27.000Z (3 months ago)
- Last Synced: 2025-01-01T20:32:30.172Z (1 day ago)
- Topics: bioinformatics, contamination, contamination-removal, fastq, human-contamination, human-read-removal
- Language: Rust
- Homepage: https://doi.org/10.1093/gigascience/giae010
- Size: 231 KB
- Stars: 34
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# NoHuman
[![Rust CI](https://github.com/mbhall88/nohuman/actions/workflows/ci.yaml/badge.svg)](https://github.com/mbhall88/nohuman/actions/workflows/ci.yaml)
[![Crates.io](https://img.shields.io/crates/v/nohuman.svg)](https://crates.io/crates/nohuman)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![github release version](https://img.shields.io/github/v/release/mbhall88/nohuman)](https://github.com/mbhall88/nohuman/releases)
[![DOI:10.1093/gigascience/giae010](https://img.shields.io/badge/citation-10.1093/gigascience/giae010-blue)][paper]👤🧬🚫 **Remove human reads from a sequencing run** 👤🧬️🚫
`nohuman` removes human reads from sequencing reads by classifying them with [kraken2][kraken] against a custom database
built from all of the genomes in the Human Pangenome Reference Consortium's (
HPRC) [first draft human pangenome reference](https://doi.org/10.1038/s41586-023-05896-x). It can take any type of
sequencing technology. Read more about the development of this method [here][paper].- [NoHuman](#nohuman)
- [Install](#install)
- [Conda (recommended)](#conda-recommended)
- [Precompiled binary](#precompiled-binary)
- [Cargo](#cargo)
- [Container](#container)
- [`apptainer`](#apptainer)
- [`docker`](#docker)
- [Build from source](#build-from-source)
- [Usage](#usage)
- [Download the database](#download-the-database)
- [Check dependencies are available](#check-dependencies-are-available)
- [Remove human reads](#remove-human-reads)
- [Keep human reads](#keep-human-reads)
- [Full usage](#full-usage)
- [Alternates](#alternates)
- [Cite](#cite)## Install
### Conda (recommended)
[![Conda (channel only)](https://img.shields.io/conda/vn/bioconda/nohuman)](https://anaconda.org/bioconda/nohuman)
[![bioconda version](https://anaconda.org/bioconda/nohuman/badges/platforms.svg)](https://anaconda.org/bioconda/nohuman)
![Conda Downloads](https://img.shields.io/conda/d/bioconda/nohuman)```shell
$ conda install -c bioconda nohuman
```### Precompiled binary
![GitHub Downloads (all assets, all releases)](https://img.shields.io/github/downloads/mbhall88/nohuman/total)
> [!IMPORTANT]
> You will need to [install kraken2][kraken] yourself using this install method.```shell
curl -sSL nohuman.mbh.sh | sh
# or with wget
wget -nv -O - nohuman.mbh.sh | sh
```You can also pass options to the script like so
```
$ curl -sSL nohuman.mbh.sh | sh -s -- --help
install.sh [option]Fetch and install the latest version of nohuman, if nohuman is already
installed it will be updated to the latest version.Options
-V, --verbose
Enable verbose output for the installer-f, -y, --force, --yes
Skip the confirmation prompt during installation-p, --platform
Override the platform identified by the installer [default: apple-darwin]-b, --bin-dir
Override the bin installation directory [default: /usr/local/bin]-a, --arch
Override the architecture identified by the installer [default: x86_64]-B, --base-url
Override the base URL used for downloading releases [default: https://github.com/mbhall88/nohuman/releases]-h, --help
Display this help message
```### Cargo
![Crates.io](https://img.shields.io/crates/d/nohuman)
> [!IMPORTANT]
> You will need to [install kraken2][kraken] yourself using this install method.```shell
$ cargo install nohuman
```### Container
Docker images are hosted on the GitHub Container registry.
#### `apptainer`
Prerequisite: [`apptainer`][apptainer] (previously `singularity`)
```shell
$ URI="docker://ghcr.io/mbhall88/nohuman:latest"
$ apptainer exec "$URI" nohuman --help
```The above will use the latest version. If you want to specify a version then use a
[tag][ghcr] like so.```shell
$ VERSION="0.2.1"
$ URI="docker://ghcr.io/mbhall88/nohuman:${VERSION}"
```#### `docker`
Prerequisite: [`docker`][docker]
```shell
$ docker pull ghcr.io/mbhall88/nohuman:latest
$ docker run ghcr.io/mbhall88/nohuman:latest nohuman --help
```You can find all the available tags [here][ghcr].
### Build from source
> [!IMPORTANT]
> You will need to [install kraken2][kraken] yourself using this install method.```shell
$ git clone https://github.com/mbhall88/nohuman.git
$ cd nohuman
$ cargo build --release
$ target/release/nohuman -h
```## Usage
### Download the database
```
$ nohuman -d
```by default, this will place the database in `$HOME/.nohuman/db`. If you want to download it somewhere else, use
the `--db` option.### Check dependencies are available
```
$ nohuman -c
[2023-12-14T04:10:46Z INFO ] All dependencies are available
```### Remove human reads
```
$ nohuman -t 4 in.fq
```this will pass 4 threads to kraken2 and output the clean reads as `in.nohuman.fq`.
You can specify where to write the output file with `-o`
```
$ nohuman -t 4 -o clean.fq in.fq
```If you have paired-end Illumina reads
```
$ nohuman -t 4 in_1.fq in_2.fq
```or to specify a different path for the output
```
$ nohuman -t 4 --out1 clean_1.fq --out2 clean_2.fq in_1.fq in_2.fq
```Set a [minimum confidence score][conf] for kraken2 classifications
```
$ nohuman --conf 0.5 in.fq
```or write the kraken2 read classification output to a file
```
$ nohuman -k kraken.out in.fq
```> [!TIP]
> Compressed output will be inferred from the specified output path(s). If no output path is provided, the same
> compression as the input will be used. To override the output compression format, use the `--output-type` option.
> Supported compression formats are gzip (`.gz`), zstandard (`zst`), bzip2 (`.bz2`), and xz (`.xz`). If multiple threads are provided, these
> will be used for compression of the output (where possible).### Keep human reads
You can invert the functionality of `nohuman` to keep only the human reads by using the `--human/-H` flag.
```
$ nohuman -h
Remove human reads from a sequencing runUsage: nohuman [OPTIONS] [INPUT]...
Arguments:
[INPUT]... Input file(s) to remove human reads fromOptions:
-o, --out1 First output file.
-O, --out2 Second output file.
-c, --check Check that all required dependencies are available and exit
-d, --download Download the database
-D, --db Path to the database [default: /home/michael/.nohuman/db]
-F, --output-type Output compression format. u: uncompressed; b: Bzip2; g: Gzip; x: Xz (Lzma); z: Zstd
-t, --threads Number of threads to use in kraken2 and optional output compression. Cannot be 0 [default: 1]
-H, --human Output human reads instead of removing them
-C, --conf <[0, 1]> Kraken2 minimum confidence score [default: 0.0]
-k, --kraken-output Write the Kraken2 read classification output to a file
-v, --verbose Set the logging level to verbose
-h, --help Print help (see more with '--help')
-V, --version Print version
```### Full usage
```
$ nohuman --help
Remove human reads from a sequencing runUsage: nohuman [OPTIONS] [INPUT]...
Arguments:
[INPUT]...
Input file(s) to remove human reads fromOptions:
-o, --out1
First output file.Defaults to the name of the first input file with the suffix "nohuman" appended.
e.g. "input_1.fastq" -> "input_1.nohuman.fq".
Compression of the output file is determined by the file extension of the output file name.
Or by using the `--output-type` option. If no output path is given, the same compression
as the input file will be used.-O, --out2
Second output file.Defaults to the name of the first input file with the suffix "nohuman" appended.
e.g. "input_2.fastq" -> "input_2.nohuman.fq".
Compression of the output file is determined by the file extension of the output file name.
Or by using the `--output-type` option. If no output path is given, the same compression
as the input file will be used.-c, --check
Check that all required dependencies are available and exit-d, --download
Download the database-D, --db
Path to the database[default: ~/.nohuman/db]
-F, --output-type
Output compression format. u: uncompressed; b: Bzip2; g: Gzip; x: Xz (Lzma); z: ZstdIf not provided, the format will be inferred from the given output file name(s), or the
format of the input file(s) if no output file name(s) are given.-t, --threads
Number of threads to use in kraken2 and optional output compression. Cannot be 0[default: 1]
-H, --human
Output human reads instead of removing them
-C, --conf <[0, 1]>
Kraken2 minimum confidence score[default: 0.0]
-k, --kraken-output
Write the Kraken2 read classification output to a file
-v, --verbose
Set the logging level to verbose-h, --help
Print help (see a summary with '-h')-V, --version
Print version
```## Alternates
[Hostile](https://github.com/bede/hostile) is an alignment-based approach that performs well. It take longer and uses
more memory than the `nohuman` kraken approach, but has slightly better accuracy for Illumina data. See the [paper] for
more details and for other alternate approaches.## Cite
[![DOI:10.1093/gigascience/giae010](https://img.shields.io/badge/citation-10.1093/gigascience/giae010-blue)][paper]
> Hall, Michael B., and Lachlan J. M. Coin. “Pangenome databases improve host removal and mycobacteria classification
> from clinical metagenomic data” GigaScience, April 4, 2024.```bibtex
@article{hall_pangenome_2024,
title = {Pangenome databases improve host removal and mycobacteria classification from clinical metagenomic data},
volume = {13},
issn = {2047-217X},
url = {https://doi.org/10.1093/gigascience/giae010},
doi = {10.1093/gigascience/giae010},
urldate = {2024-04-07},
journal = {GigaScience},
author = {Hall, Michael B and Coin, Lachlan J M},
month = jan,
year = {2024},
pages = {giae010},
}```
[quay.io]: https://quay.io/repository/mbhall88/nohuman
[apptainer]: https://github.com/apptainer/apptainer
[docker]: https://docs.docker.com/v17.12/install/
[kraken]: https://github.com/DerrickWood/kraken2
[paper]: https://doi.org/10.1093/gigascience/giae010
[ghcr]: https://github.com/mbhall88/nohuman/pkgs/container/nohuman
[conf]: https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#confidence-scoring