https://github.com/cvigilv/residuefisher
Bioinformatics protocol that aims at mining information at the sequence and structure level of protein chain to detect possible evolutionary conserved residues.
https://github.com/cvigilv/residuefisher
bioinformatics-analysis bioinformatics-pipeline biology foldseek structural-bioin structural-biology
Last synced: 7 months ago
JSON representation
Bioinformatics protocol that aims at mining information at the sequence and structure level of protein chain to detect possible evolutionary conserved residues.
- Host: GitHub
- URL: https://github.com/cvigilv/residuefisher
- Owner: cvigilv
- License: mit
- Created: 2022-12-25T14:42:14.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-05-09T21:02:25.000Z (over 2 years ago)
- Last Synced: 2025-03-28T02:03:41.582Z (7 months ago)
- Topics: bioinformatics-analysis, bioinformatics-pipeline, biology, foldseek, structural-bioin, structural-biology
- Language: Python
- Homepage:
- Size: 33.3 MB
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ResidueFisher
`ResidueFisher` is a bioinformatics protocol for the search of protein homology using a three-step ”search, detect, and enrich” model that uses a combination of structural and sequence aligners working in tandem to filter and enrich conservation signals, resulting in "fished" residues that may shed light into understanding ans studying a protein of interest.

## Table of Contents
- [Dependencies](#dependencies)
- [Installation](#installation)
- [Usage](#usage)
- [Support](#support)
- [Contributing](#contributing)
## Dependencies
`ResidueFisher` depends on:
- bash
- conda
- docker
- mafft
- tmux
## Installation
To install `ResidueFisher`, run the following code snippet:
```sh
git clone https://github.com/cvigilv/ResidueFisher
cd ResidueFisher
make configure
conda activate ResidueFisher
```
In order to use `ResidueFisher`, the conda environment must be active (`conda activate ResidueFisher`)
## Usage
### Database preparation
To prepare a database using Foldseek, run the script `bin/prep_database.sh` script as follows:
```sh
sh bin/prep_database.sh
# Example preparation of PDB dataset available in Foldseek
sh bin/prep_database.sh PDB mypdb
```
To see the available datasets, run `bin/prep_database.sh` without arguments.
To prepare a database from PDB files, please refer to [foldseek tutorial](https://github.com/steineggerlab/foldseek#databases). In order for `ResidueFisher` to work correctly, user created databases must be inside a directory named `data/foldseek_dbs` in the proyect root and must contain FASTA files for the aminoacid sequence and 3di sequence, which can be generated as follows:
```sh
foldseek convert2fasta .fasta
foldseek lndb _h _ss_h
foldseek convert2fasta _ss _ss.fasta
```
### Query protein preparation
Unlike Foldseek, this protocol is intended to study a single protein chain; therefore, in order to use `ResidueFisher`, one must first extract a chain of interest from its original PDB file.
In the `src/scripts` folder, there is a script called `splitchains.py`, which extracts all the chains from a particular PDB file and saves them as separate files for use in `ResidueFisher`.
*Note*: the recommended way of preparing and storing all the structure files is creating a new folder in data called `queries` (due to the nature of this files) and run the chain splitting script inside this folder. Here is an example of this procedure:
```sh
# Ensure we have the conda environment activated
conda activate ResidueFisher
# Assuming you are at the project root...
mkdir data/queries
cd data/queries
wget https://files.rcsb.org/download/3F3P.pdb
python ../../src/scripts/splitchains.py 3F3P.pdb
```
From this example, a total of 13 should be found inside the `data/queries` folder: 1 for the original structure and 12 corresponding to the chains A through L of 3F3P.
### Foldseek-fishing Usage
To use ResidueFisher, run the script `bin/ResidueFisher` script as follows:
```sh
sh bin/ResidueFisher
# Example using the previously prepared protein and dataset
sh bin/ResidueFisher data/queries/3F3P_C.pdb mypdb
```
This will generate a folder in `results` with the following structure:
```
results/3F3PC_pdb/
├── foldseek/
├── msa/
├── moma/
└── tree/
```
Inside each subdirectory, log files and result can be found in order to analyse and study the protein used in the protocol.
## Support
Please [open an issue](https://github.com/cvigilv/ResidueFisher/issues/new) for
support.
## Contributing
Please contribute using [Github Flow](https://guides.github.com/introduction/flow/). Create a branch, add
commits, and [open a pull request](https://github.com/cvigilv/ResidueFisher/compare/).
## License
MIT