An open API service indexing awesome lists of open source software.

https://github.com/ccdmb/effectorfisher-core

🧬 EffectorFisher links pangenome isoforms to disease traits to predict phenotype-associated effectors. 🌾🦠
https://github.com/ccdmb/effectorfisher-core

Last synced: about 2 months ago
JSON representation

🧬 EffectorFisher links pangenome isoforms to disease traits to predict phenotype-associated effectors. 🌾🦠

Awesome Lists containing this project

README

          

# EffectorFisher-core Module (Python Library)

The EffectorFisher module is a Python library used for comparing pangenome-derived protein isoform profiles with host virulence/disease phenotyping, to predict candidate effectors with strong phenotypic-association. EffectorFisher can be used to refine the output of [Predector](https://github.com/ccdmb/predector), which combines multiple methods to predict proteins with effector-like properties.

EffectorFisher was developed at the Centre for Crop and Disease Management ([CCDM](https://www.ccdm.com.au/)) by Mohitul Hossain within [RTP](https://www.education.gov.au/research-block-grants/research-training-program)/[GRDC](https://grdc.com.au/)-funded Ph.D. project [CUR2301-006RSX](https://grdc.com.au/grdc-investments/investments/investment?code=CUR2301-006RSX), with additional support from the Western Australian Agricultural Research Collaboration ([WAARC](https://waarc.org.au/)), under the supervision of [Dr James Hane](https://scholar.google.com/citations?hl=en&user=T4W70sEAAAAJ&view_op=list_works&sortby=pubdate) and co-supervision of Drs Huyen Phan and Kristina Gagalova. Assistance with code development was provided by Dr Kristina Gagalova and Mr Pavel Misiun, with testing performed by Ms Naomi Gray.

A manuscript describing this method is currently under review, if you use EffectorFisher please check this space for citation details.

## Installation
EffectorFisher-core is a command-line tool, written as Python.

### Requirements
* Python 3.6 or newer. More details can be found [here](https://www.python.org/downloads/).
* `pip` installed, with `pip >= 21.0` recommended. More details about `pip` installation can be found [here](https://pip.pypa.io/en/stable/).
* Internet connection to install from GitHub

### Quick installation from GitHub
```
pip install git+https://github.com/ccdmb/EffectorFisher-core.git
```
This will:
* Download the latest version of the tool
* Install all required dependencies
* Register the command-line tool effectorfisher-core.py

### Manual Installation (From Cloned Repository)
If you prefer to work with the source:
```
git clone https://github.com/ccdmb/EffectorFisher-core.git
cd EffectorFisher-core
pip install .
```
To install in development mode (reflects source code changes automatically):
```
pip install -e .
```

## Input Files
To run this module, you need to provide the following input files:

1. `Effector_variants_PAV_output.txt`: This file is a required input for EffectorFisher-core and must be generated by running the **EffectorFisher** tool. The file can be found in the `Final_PAV_result` directory upon successful execution of the EffectorFisher pipeline. This file contains the presence–absence variation (PAV) matrix of predicted effector candidates across isolates. Detailed instructions for generating this file can be found in the "EffectorFisher" repo: https://github.com/muhitulh/EffectorFisher/tree/main. Note: Both the EffectorFisher and EffectorFisher-core modules are components of the associated manuscript.

2. `phenotype_data_quantitative.txt` or `phenotype_data_qualitative.txt`:

- `phenotype_data_quantitative.txt`: This file should contain numeric disease scores. You need to prepare this file as shown in the example.
- `phenotype_data_qualitative.txt`: This file should contain disease severity levels (high or low). You need to prepare this file as shown in the example.
3. `predector_results.txt`: This file is a required input for **EffectorFisher-core** and must be generated by running the **Predector** tool. Predector is a published tool in _Scientific Reports_ ([link](https://www.nature.com/articles/s41598-021-99363-0)) that prioritizes candidate effector proteins based on a range of effector-like features.
Installation and usage instructions are available in the [Predector GitHub repository](https://github.com/ccdmb/predector).

4. `known_effector.txt` (optional): You can provide known effector IDs and names in this file, as shown in the example. If this file is not provided, the module will not include known effector ranking in the final output.

**Important:** Make sure your input file names are the same as mentioned above and that they are located in the subdirectory `00_input_files` within your working directory. Alternatively, you can provide the input file paths as command-line arguments (note: still working on it).

## Directory Structure
Here's an example of the directory structure for running the EffectorFisher module:

```plaintext
working_directory/
β”œβ”€β”€ 00_input_files/
β”‚ β”œβ”€β”€ Effector_variants_PAV_output.txt
β”‚ β”œβ”€β”€ phenotype_data_quantitative.txt (or phenotype_data_qualitative.txt)
β”‚ β”œβ”€β”€ predector_results.txt
β”‚ └── known_effector.txt (optional)
β”œβ”€β”€ effectorfisher_core.py
└── ...
```

Make sure to place the input files in the `00_input_files` directory within your working directory.

## Usage

Run the pipeline with:
```
effectorfisher_core.py --data-type [options]
```

### Basic example

```
effectorfisher_core.py --data-type quantitative --input-dir 00_input_files/ --save
```
This will:
* Process input files
* Apply default filters
* Save both intermediate and final output files

### Final Output Only (No --save)
```
effectorfisher_core.py --data-type quantitative --input-dir 00_input_files/
```
## Options

```
effectorfisher_core.py --help

usage: effectorfisher_core.py [-h] [--data-type {quantitative,qualitative}]
[--input-dir INPUT_DIR] [--output-dir OUTPUT_DIR]
[--min-variant MIN_VARIANT] [--save]
[--cyst CYST] [--total-aa TOTAL_AA]
[--pred-score PRED_SCORE] [--p-value P_VALUE]

Process phenotype and variant data for EffectorFisher

optional arguments:
-h, --help Show help message and exit
--data-type Required. Either `quantitative` or `qualitative`
--input-dir Directory containing input files (default: `00_input_files`)
--output-dir Directory for output files (default: `output/`)
--min-variant Minimum isoform count (default: 5)
--save Save all intermediate and final results
--cyst Minimum cysteine count (default: 2)
--total-aa Maximum amino acid length (default: 300)
--pred-score Minimum prediction score (default: 2)
--p-value P-value threshold (default: 0.05)
```

**Must include:**
- `--data_type `: Specify the type of phenotypic data you have. Choose either `qualitative` or `quantitative`. See the examples in the `input_files` directory.

**Important:**
- `--min_iso `: Specify the minimum isoform number (default = 5).

**Optional:**
- `--cyst `: Specify the cysteine count threshold (default = 2).
- `--pred_score `: Specify the prediction score threshold (default = 2).
- `--total_aa `: Specify the total amino acid count threshold (default = 300).
- `--p_value `: Specify the p-value threshold (default = 0.05).

## Example
```
effectorfisher_core.py --data_type quantitative --min_iso 5 --cyst 2 --pred_score 2 --total_aa 300 --p_value 0.05
```

## Output

## Main Output
| File Name | Description |
|----------------------------|-----------------------------------------------------|
| `complete_isoform_list.txt` | Complete list of isoforms processed by the module. |
| `complete_loci_list.txt` | Complete list of loci processed by the module. |

## Additional Output
| File Name | Description |
|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `filtered_loci_list.txt` | List of loci based on the default or specified filters. Alternatively, you can apply filters to `complete_locus_list.txt` as required. |
| `known_effectors_ranking.txt` | Contains the ranking of known effectors if you provide a known effector input file. |

Additional results: Rank the known effectors after filtering.