https://github.com/seqeralabs/nf-chai

POC Nextflow pipeline to run the Chai-1, SOTA model for biomolecular structure prediction
https://github.com/seqeralabs/nf-chai

nextflow pipeline protein-structure structure-prediction

Last synced: 10 months ago
JSON representation

POC Nextflow pipeline to run the Chai-1, SOTA model for biomolecular structure prediction

Host: GitHub
URL: https://github.com/seqeralabs/nf-chai
Owner: seqeralabs
License: mpl-2.0
Created: 2024-11-20T10:11:45.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-01-16T18:26:57.000Z (over 1 year ago)
Last Synced: 2025-08-10T05:59:37.205Z (10 months ago)
Topics: nextflow, pipeline, protein-structure, structure-prediction
Language: Nextflow
Homepage:
Size: 12.6 MB
Stars: 11
Watchers: 5
Forks: 5
Open Issues: 2
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Citation: CITATIONS.md

Awesome Lists containing this project

README

          # nf-chai

[![GitHub Actions CI Status](https://github.com/seqeralabs/nf-chai/actions/workflows/ci.yml/badge.svg)](https://github.com/seqeralabs/nf-chai/actions/workflows/ci.yml)

[![GitHub Actions Linting Status](https://github.com/seqeralabs/nf-chai/actions/workflows/linting.yml/badge.svg)](https://github.com/seqeralabs/nf-chai/actions/workflows/linting.yml)

[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)

[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/)

[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)

[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)

[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/seqeralabs/nf-chai)

## POC implementation of Chai-1 in Nextflow

## Introduction

**nf-chai** is a simple, proof-of-concept bioinformatics pipeline for running the [Chai-1](https://github.com/chaidiscovery/chai-lab) protein prediction algorithm on an input set of protein sequences in FASTA format. The pipeline has been written in Nextflow to generate results for downstream analysis in a reproducible, scalable and portable way.

## Usage

> [!NOTE]

> If you are new to Nextflow, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.

First, prepare a FASTA file with entities supported by Chai-1 in the format highlighted below. You can also provide a directory with multiple FASTA files by specifying the `--input` parameter with a directory path like this: `--input "/path/to/fasta_files/*.fasta|*.fa"`.

`multiple_entities.fa`:

```txt

>protein|name=example-of-long-protein

AGSHSMRYFSTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRGEPRAPWVEQEGPEYWDRETQKYKRQAQTDRVSLRNLRGYYNQSEAGSHTLQWMFGCDLGPDGRLLRGYDQSAYDGKDYIALNEDLRSWTAADTAAQITQRKWEAAREAEQRRAYLEGTCVEWLRRYLENGKETLQRAEHPKTHVTHHPVSDHEATLRCWALGFYPAEITLTWQWDGEDQTQDTELVETRPAGDGTFQKWAAVVVPSGEEQRYTCHVQHEGLPEPLTLRWEP

>protein|name=example-of-short-protein

AIQRTPKIQVYSRHPAENGKSNFLNCYVSGFHPSDIEVDLLKNGERIEKVEHSDLSFSKDWSFYLLYYTEFTPTEKDEYACRVNHVTLSQPKIVKWDRDM

>protein|name=example-peptide

GAAL

>ligand|name=example-ligand-as-smiles

CCCCCCCCCCCCCC(=O)O

```

Run the pipeline using CPUs with the command below:

```bash

nextflow run seqeralabs/nf-chai \

   --input multiple_entities.fa \

   --outdir  \

   -profile 

```

Run the pipeline using GPUs with the command below:

```bash

nextflow run seqeralabs/nf-chai \

   --input multiple_entities.fa \

   --outdir  \

   --use_gpus \

   -profile 

```

Set the `--weights_dir` parameter to a location with the pre-downloaded weights required by Chai-1 to avoid having to download them every time you run the pipeline.

To further improve prediction performance using pre-built multiple sequence alignments (MSA) with evolutionary information, set the `--msa_dir` parameter to a location with [`*.aligned.pqt`](https://github.com/chaidiscovery/chai-lab/tree/main/examples/msas#adding-msa-evolutionary-information) format as required by Chai-1.

## Credits

nf-chai was originally written by the Seqera Team.

## Contributions and Support

If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).

## Citations

An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.

This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/main/LICENSE).

> **The nf-core framework for community-curated bioinformatics pipelines.**

>

> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

>

> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/seqeralabs/nf-chai

Awesome Lists containing this project

README