https://github.com/wanchunnie/PB-LKS

A python package for phage-host interaction prediction
https://github.com/wanchunnie/PB-LKS

Last synced: about 2 months ago
JSON representation

A python package for phage-host interaction prediction

Host: GitHub
URL: https://github.com/wanchunnie/PB-LKS
Owner: wanchunnie
License: apache-2.0
Created: 2023-11-26T06:43:43.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-12-18T16:58:41.000Z (over 1 year ago)
Last Synced: 2025-02-09T20:36:34.781Z (3 months ago)
Language: Python
Size: 9.29 MB
Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-virome - PB-LKS - K-mer profiles for phage-bacteria prediction. [Python] (Host Prediction / RNA Virus Identification)

README

        # PB-LKS

This is the official code for our paper "PB-LKS: a python package for predicting **P**hage-**B**acteria interaction through **L**ocal **K**-mer **S**trategy"

```

 +--------------------------------------------+

 | |---\  |----\       |       |  /   /----\  |

 | |    | |     |      |       | /   |        |

 | |___/  |____/   --  |       |/\    \----\  |

 | |      |     \      |       |  \         | |

 | |      |_____/      |_____  |   \  \____/  |

 +--------------------------------------------+

```

## Local environment setup

1. install conda(to manage environment)

2.  Change directory to the path of this project

      ```bash

      cd {your_path_to_PBLKS}

      ```

3. Run following codes in your terminal

   ```shell

   conda create -n PB-LKS python=3.10

   conda deactivate (if base environment is activated)

   conda activate PB-LKS

   pip install -r requirements.txt

   ``` 

## How to use

Both a command line tool and a python package is provided in this project. Developers and researchers may select according to needs after setting local environment.

### Command line interface

A command line interface is implemented and can be used by running following commands in your terminal. 

   ```bash

   cd {your_path_to_PBLKS}

   # display help message of PB-LKS CLI

   python PB-LKS.py -h

   # run and display result with example input

   python PB-LKS.py -e

   ```

other detailed arguments and their usages are listed as follows

```Shell

usage: python PB-LKS.py [-h] [-p PHAGE] [-b BAC] [-e] [-xgb] [-fea] [-ba] [-o OUTPUT] [-d]

options:

  -h, --help            show this help message and exit

  -p PHAGE, --phage PHAGE

                        path/folder to your phage sequence file(in fasta format)

  -b BAC, --bac BAC     path/folder to your bacteria sequence file(in fasta format)

  -e, --example         run model with example input and quit

  -xgb, --xgboost       run prediction with xgboost(default model is based on RandomForest)

  -fea, --feature       show 10 features of most importance and exit

  -ba, --batch          run prediction in batch, -p,-b should be folder if set True

  -o OUTPUT, --output OUTPUT

                        the folder you want to save batch prediction results (results will be printed into terminal by default)

  -d, --detail          if set true, print detailed prediction results(single prediction), including important features and decision path.

```

### python package

To make it more convenient for researchers and developers to use our model in more personalized tasks, a python package is also implemented. 

following functions are available

example code:

```python

import PBLKS

bacteria_path = 'example1.fasta'

phage_path = 'example2.fasta'

# directly gives the result of prediction

result, prob =  PBLKS.predict(bacteria_path, phage_path)

# gets descriptors from files

# can be used to reconstruct or train other type of models  

feas = PBLKS.get_descriptor(bacteria_path, phage_path)

# predicts the interaction of each bacteria-phage pair in both folder

bac_folder = "your_folder_path"

phage_folder  = "your_folder_path"

out_put_dir = "your_dest_folder"

# output'll be directed to dest folder, creating 2 result files

PBLKS.predict_in_batch(bac_folder, phage_folder, output_dir=out_put_dir)

# printing results into terminal

PBLKS.predict_in_batch(bac_folder, phage_folder, output_dir=None)

# get decision path

decision_path = PBLKS.show_decision_path(bacteria_path, phage_path)

# get top 10 important features

# result is orgnized in dict like: {kmer: importance}

features = PBLKS.show_feature_importance(top_cnt=10)

```

## decision path visualization

Interpertability is a key feature in decision tree based alogrithms, several interfaces to visualize how our model make decisions are thus implemented. 

Except calling  `-d` or `PBLKS.show_decision_path` to show decision path with 01, we implemented a script to visualize decision trees in our model.

Users can run following command in termianl

```bash

python make_tree_plots.py

```

This'll create a detailed visualization of every decision tree into `/PBLKS/Trees`, providing a more straightforward way to visualize decision.

## other models

A xgboost based model is also included in `/models`.

can be used by adding `-xgb` argument in CLI or adding parameter `use_xgboost=True` when calling methods in our package 

## Source code

Visit this git repository to get source code of PB-LKS:

	https://github.com/wanchunnie/PB-LKS

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wanchunnie/PB-LKS

Awesome Lists containing this project

README