https://github.com/mbhall88/nohuman
Remove human reads from a sequencing run
https://github.com/mbhall88/nohuman
bioinformatics contamination contamination-removal fastq human-contamination human-read-removal
Last synced: 6 months ago
JSON representation
Remove human reads from a sequencing run
- Host: GitHub
- URL: https://github.com/mbhall88/nohuman
- Owner: mbhall88
- License: mit
- Created: 2023-11-21T06:39:11.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-01T11:44:27.000Z (about 1 year ago)
- Last Synced: 2025-05-05T08:59:23.945Z (6 months ago)
- Topics: bioinformatics, contamination, contamination-removal, fastq, human-contamination, human-read-removal
- Language: Rust
- Homepage: https://doi.org/10.1093/gigascience/giae010
- Size: 231 KB
- Stars: 40
- Watchers: 1
- Forks: 4
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# NoHuman
[](https://github.com/mbhall88/nohuman/actions/workflows/ci.yaml)
[](https://crates.io/crates/nohuman)
[](https://opensource.org/licenses/MIT)
[](https://github.com/mbhall88/nohuman/releases)
[][paper]
👤🧬🚫 **Remove human reads from a sequencing run** 👤🧬️🚫
`nohuman` removes human reads from sequencing reads by classifying them with [kraken2][kraken] against a custom database
built from all of the genomes in the Human Pangenome Reference Consortium's (
HPRC) [first draft human pangenome reference](https://doi.org/10.1038/s41586-023-05896-x). It can take any type of
sequencing technology. Read more about the development of this method [here][paper].
- [NoHuman](#nohuman)
- [Install](#install)
- [Conda (recommended)](#conda-recommended)
- [Precompiled binary](#precompiled-binary)
- [Cargo](#cargo)
- [Container](#container)
- [`apptainer`](#apptainer)
- [`docker`](#docker)
- [Build from source](#build-from-source)
- [Usage](#usage)
- [Download the database](#download-the-database)
- [Check dependencies are available](#check-dependencies-are-available)
- [Remove human reads](#remove-human-reads)
- [Keep human reads](#keep-human-reads)
- [Full usage](#full-usage)
- [Alternates](#alternates)
- [Cite](#cite)
## Install
### Conda (recommended)
[](https://anaconda.org/bioconda/nohuman)
[](https://anaconda.org/bioconda/nohuman)

```shell
$ conda install -c bioconda nohuman
```
### Precompiled binary

> [!IMPORTANT]
> You will need to [install kraken2][kraken] yourself using this install method.
```shell
curl -sSL nohuman.mbh.sh | sh
# or with wget
wget -nv -O - nohuman.mbh.sh | sh
```
You can also pass options to the script like so
```
$ curl -sSL nohuman.mbh.sh | sh -s -- --help
install.sh [option]
Fetch and install the latest version of nohuman, if nohuman is already
installed it will be updated to the latest version.
Options
-V, --verbose
Enable verbose output for the installer
-f, -y, --force, --yes
Skip the confirmation prompt during installation
-p, --platform
Override the platform identified by the installer [default: apple-darwin]
-b, --bin-dir
Override the bin installation directory [default: /usr/local/bin]
-a, --arch
Override the architecture identified by the installer [default: x86_64]
-B, --base-url
Override the base URL used for downloading releases [default: https://github.com/mbhall88/nohuman/releases]
-h, --help
Display this help message
```
### Cargo

> [!IMPORTANT]
> You will need to [install kraken2][kraken] yourself using this install method.
```shell
$ cargo install nohuman
```
### Container
Docker images are hosted on the GitHub Container registry.
#### `apptainer`
Prerequisite: [`apptainer`][apptainer] (previously `singularity`)
```shell
$ URI="docker://ghcr.io/mbhall88/nohuman:latest"
$ apptainer exec "$URI" nohuman --help
```
The above will use the latest version. If you want to specify a version then use a
[tag][ghcr] like so.
```shell
$ VERSION="0.2.1"
$ URI="docker://ghcr.io/mbhall88/nohuman:${VERSION}"
```
#### `docker`
Prerequisite: [`docker`][docker]
```shell
$ docker pull ghcr.io/mbhall88/nohuman:latest
$ docker run ghcr.io/mbhall88/nohuman:latest nohuman --help
```
You can find all the available tags [here][ghcr].
### Build from source
> [!IMPORTANT]
> You will need to [install kraken2][kraken] yourself using this install method.
```shell
$ git clone https://github.com/mbhall88/nohuman.git
$ cd nohuman
$ cargo build --release
$ target/release/nohuman -h
```
## Usage
### Download the database
```
$ nohuman -d
```
by default, this will place the database in `$HOME/.nohuman/db`. If you want to download it somewhere else, use
the `--db` option.
### Check dependencies are available
```
$ nohuman -c
[2023-12-14T04:10:46Z INFO ] All dependencies are available
```
### Remove human reads
```
$ nohuman -t 4 in.fq
```
this will pass 4 threads to kraken2 and output the clean reads as `in.nohuman.fq`.
You can specify where to write the output file with `-o`
```
$ nohuman -t 4 -o clean.fq in.fq
```
If you have paired-end Illumina reads
```
$ nohuman -t 4 in_1.fq in_2.fq
```
or to specify a different path for the output
```
$ nohuman -t 4 --out1 clean_1.fq --out2 clean_2.fq in_1.fq in_2.fq
```
Set a [minimum confidence score][conf] for kraken2 classifications
```
$ nohuman --conf 0.5 in.fq
```
or write the kraken2 read classification output to a file
```
$ nohuman -k kraken.out in.fq
```
> [!TIP]
> Compressed output will be inferred from the specified output path(s). If no output path is provided, the same
> compression as the input will be used. To override the output compression format, use the `--output-type` option.
> Supported compression formats are gzip (`.gz`), zstandard (`zst`), bzip2 (`.bz2`), and xz (`.xz`). If multiple threads are provided, these
> will be used for compression of the output (where possible).
### Keep human reads
You can invert the functionality of `nohuman` to keep only the human reads by using the `--human/-H` flag.
```
$ nohuman -h
Remove human reads from a sequencing run
Usage: nohuman [OPTIONS] [INPUT]...
Arguments:
[INPUT]... Input file(s) to remove human reads from
Options:
-o, --out1 First output file.
-O, --out2 Second output file.
-c, --check Check that all required dependencies are available and exit
-d, --download Download the database
-D, --db Path to the database [default: /home/michael/.nohuman/db]
-F, --output-type Output compression format. u: uncompressed; b: Bzip2; g: Gzip; x: Xz (Lzma); z: Zstd
-t, --threads Number of threads to use in kraken2 and optional output compression. Cannot be 0 [default: 1]
-H, --human Output human reads instead of removing them
-C, --conf <[0, 1]> Kraken2 minimum confidence score [default: 0.0]
-k, --kraken-output Write the Kraken2 read classification output to a file
-v, --verbose Set the logging level to verbose
-h, --help Print help (see more with '--help')
-V, --version Print version
```
### Full usage
```
$ nohuman --help
Remove human reads from a sequencing run
Usage: nohuman [OPTIONS] [INPUT]...
Arguments:
[INPUT]...
Input file(s) to remove human reads from
Options:
-o, --out1
First output file.
Defaults to the name of the first input file with the suffix "nohuman" appended.
e.g. "input_1.fastq" -> "input_1.nohuman.fq".
Compression of the output file is determined by the file extension of the output file name.
Or by using the `--output-type` option. If no output path is given, the same compression
as the input file will be used.
-O, --out2
Second output file.
Defaults to the name of the first input file with the suffix "nohuman" appended.
e.g. "input_2.fastq" -> "input_2.nohuman.fq".
Compression of the output file is determined by the file extension of the output file name.
Or by using the `--output-type` option. If no output path is given, the same compression
as the input file will be used.
-c, --check
Check that all required dependencies are available and exit
-d, --download
Download the database
-D, --db
Path to the database
[default: ~/.nohuman/db]
-F, --output-type
Output compression format. u: uncompressed; b: Bzip2; g: Gzip; x: Xz (Lzma); z: Zstd
If not provided, the format will be inferred from the given output file name(s), or the
format of the input file(s) if no output file name(s) are given.
-t, --threads
Number of threads to use in kraken2 and optional output compression. Cannot be 0
[default: 1]
-H, --human
Output human reads instead of removing them
-C, --conf <[0, 1]>
Kraken2 minimum confidence score
[default: 0.0]
-k, --kraken-output
Write the Kraken2 read classification output to a file
-v, --verbose
Set the logging level to verbose
-h, --help
Print help (see a summary with '-h')
-V, --version
Print version
```
## Alternates
[Hostile](https://github.com/bede/hostile) is an alignment-based approach that performs well. It take longer and uses
more memory than the `nohuman` kraken approach, but has slightly better accuracy for Illumina data. See the [paper] for
more details and for other alternate approaches.
## Cite
[][paper]
> Hall, Michael B., and Lachlan J. M. Coin. “Pangenome databases improve host removal and mycobacteria classification
> from clinical metagenomic data” GigaScience, April 4, 2024.
```bibtex
@article{hall_pangenome_2024,
title = {Pangenome databases improve host removal and mycobacteria classification from clinical metagenomic data},
volume = {13},
issn = {2047-217X},
url = {https://doi.org/10.1093/gigascience/giae010},
doi = {10.1093/gigascience/giae010},
urldate = {2024-04-07},
journal = {GigaScience},
author = {Hall, Michael B and Coin, Lachlan J M},
month = jan,
year = {2024},
pages = {giae010},
}
```
[quay.io]: https://quay.io/repository/mbhall88/nohuman
[apptainer]: https://github.com/apptainer/apptainer
[docker]: https://docs.docker.com/v17.12/install/
[kraken]: https://github.com/DerrickWood/kraken2
[paper]: https://doi.org/10.1093/gigascience/giae010
[ghcr]: https://github.com/mbhall88/nohuman/pkgs/container/nohuman
[conf]: https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#confidence-scoring