https://github.com/cvigilv/residuefisher

Bioinformatics protocol that aims at mining information at the sequence and structure level of protein chain to detect possible evolutionary conserved residues.
https://github.com/cvigilv/residuefisher

bioinformatics-analysis bioinformatics-pipeline biology foldseek structural-bioin structural-biology

Last synced: 7 months ago
JSON representation

Bioinformatics protocol that aims at mining information at the sequence and structure level of protein chain to detect possible evolutionary conserved residues.

Host: GitHub
URL: https://github.com/cvigilv/residuefisher
Owner: cvigilv
License: mit
Created: 2022-12-25T14:42:14.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2023-05-09T21:02:25.000Z (over 2 years ago)
Last Synced: 2025-03-28T02:03:41.582Z (7 months ago)
Topics: bioinformatics-analysis, bioinformatics-pipeline, biology, foldseek, structural-bioin, structural-biology
Language: Python
Homepage:
Size: 33.3 MB
Stars: 6
Watchers: 1
Forks: 1
Open Issues: 9
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# ResidueFisher

`ResidueFisher` is a bioinformatics protocol for the search of protein homology using a three-step ”search, detect, and enrich” model that uses a combination of structural and sequence aligners working in tandem to filter and enrich conservation signals, resulting in "fished" residues that may shed light into understanding ans studying a protein of interest.

![Protocol overview](./doc/figures/protocol.png)

## Table of Contents

- [Dependencies](#dependencies)
- [Installation](#installation)
- [Usage](#usage)
- [Support](#support)
- [Contributing](#contributing)

## Dependencies

`ResidueFisher` depends on:
- bash
- conda
- docker
- mafft
- tmux

## Installation

To install `ResidueFisher`, run the following code snippet:
```sh
git clone https://github.com/cvigilv/ResidueFisher
cd ResidueFisher
make configure
conda activate ResidueFisher
```
In order to use `ResidueFisher`, the conda environment must be active (`conda activate ResidueFisher`)

## Usage
### Database preparation

To prepare a database using Foldseek, run the script `bin/prep_database.sh` script as follows:

```sh
sh bin/prep_database.sh

# Example preparation of PDB dataset available in Foldseek
sh bin/prep_database.sh PDB mypdb
```
To see the available datasets, run `bin/prep_database.sh` without arguments.

To prepare a database from PDB files, please refer to [foldseek tutorial](https://github.com/steineggerlab/foldseek#databases). In order for `ResidueFisher` to work correctly, user created databases must be inside a directory named `data/foldseek_dbs` in the proyect root and must contain FASTA files for the aminoacid sequence and 3di sequence, which can be generated as follows:
```sh
foldseek convert2fasta .fasta
foldseek lndb _h _ss_h
foldseek convert2fasta _ss _ss.fasta
```

### Query protein preparation
Unlike Foldseek, this protocol is intended to study a single protein chain; therefore, in order to use `ResidueFisher`, one must first extract a chain of interest from its original PDB file.

In the `src/scripts` folder, there is a script called `splitchains.py`, which extracts all the chains from a particular PDB file and saves them as separate files for use in `ResidueFisher`.

*Note*: the recommended way of preparing and storing all the structure files is creating a new folder in data called `queries` (due to the nature of this files) and run the chain splitting script inside this folder. Here is an example of this procedure:

```sh
# Ensure we have the conda environment activated
conda activate ResidueFisher

# Assuming you are at the project root...
mkdir data/queries
cd data/queries
wget https://files.rcsb.org/download/3F3P.pdb
python ../../src/scripts/splitchains.py 3F3P.pdb
```
From this example, a total of 13 should be found inside the `data/queries` folder: 1 for the original structure and 12 corresponding to the chains A through L of 3F3P.

### Foldseek-fishing Usage

To use ResidueFisher, run the script `bin/ResidueFisher` script as follows:
```sh
sh bin/ResidueFisher

# Example using the previously prepared protein and dataset
sh bin/ResidueFisher data/queries/3F3P_C.pdb mypdb
```

This will generate a folder in `results` with the following structure:
```
results/3F3PC_pdb/
├── foldseek/
├── msa/
├── moma/
└── tree/
```

Inside each subdirectory, log files and result can be found in order to analyse and study the protein used in the protocol.

## Support

Please [open an issue](https://github.com/cvigilv/ResidueFisher/issues/new) for
support.

## Contributing

Please contribute using [Github Flow](https://guides.github.com/introduction/flow/). Create a branch, add
commits, and [open a pull request](https://github.com/cvigilv/ResidueFisher/compare/).

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cvigilv/residuefisher

Awesome Lists containing this project

README