An open API service indexing awesome lists of open source software.

https://github.com/soedinglab/bipartite_motif_finder

BMF: Bipartite Motif Finder
https://github.com/soedinglab/bipartite_motif_finder

Last synced: 8 months ago
JSON representation

BMF: Bipartite Motif Finder

Awesome Lists containing this project

README

          

# BMF: Thermodynamic model for de novo bipartite RNA motif discovery

[![License](https://img.shields.io/github/license/soedinglab/bipartite_motif_finder.svg)](https://choosealicense.com/licenses/gpl-3.0/)
[![Issues](https://img.shields.io/github/issues/soedinglab/bipartite_motif_finder.svg)](https://github.com/soedinglab/bipartite_motif_finder/issues)

BMF (Bipartite Motif Finder) is an open source tool for finding co-occurences of sequence motifs in genomic sequences.

BMF is also available as a webserver:

* Link: [bmf.soedinglab.org](https://bmf.soedinglab.org)
* Web server repository: [soedinglab/bmf-webserver](https://github.com/soedinglab/bmf-webserver)

## Publication

[Sohrabi-Jahromi S. and Söding J. Thermodynamic model reveals most RNA-bindingproteins prefer simple and repetitive motifs, bioRxiv 2021](https://www.biorxiv.org/content/10.1101/2021.01.30.428941v1).

Notebooks used to generate the analyses in the manuscript are available at [soedinglab/bmf-paper](https://github.com/soedinglab/bmf-paper).

## Documentation
A more comprehensive BMF user guide is available in our [GitHub Wiki](https://github.com/soedinglab/bipartite_motif_finder/wiki). For questions please open an issue on [GitHub](https://github.com/soedinglab/bipartite_motif_finder/issues).

## Installation

### Requirements
* `python>3.6`
* `numpy`
* `cython`

#### Installing requirements with Conda:

Create a new conda environment with `python`, `numpy`, and `cython`:

conda create -n bmf python=3.6 numpy cython
conda activate bmf


#### Installing requirements on Ubuntu without Conda:

sudo apt-get update
sudo apt-get install python3.6 python3-pip
pip3 install numpy cython

#### Installing requirements on MacOS with brew:

brew install python3
pip install numpy cython

### BMF installation:

1. **Optional:** BMF is also available as a faster version for running on AVX2 extension capable processor. You can check if AVX2 is supported by executing `cat /proc/cpuinfo | grep avx2` on Linux and `sysctl -a | grep machdep.cpu.leaf7_features | grep AVX2` on MacOS). If your processor supports AVX2, run the following command to compile a faster version of BMF:

export USE_AVX=1

2. Install BMF with pip:

pip install https://github.com/soedinglab/bipartite_motif_finder/releases/download/v1.0.0a/bmf_tool-1.0.0.tar.gz

See BMF help page:

bmf --help

## Usage

Please refer to our [GitHub Wiki](https://github.com/soedinglab/bipartite_motif_finder/wiki) for a more detailed description of BMF and all its input parameters. In the following we provide an example workflow.

bmf [-h] [--BGsequences BGSEQUENCES | --predict]
[--input_type {fasta,fastq,seq}]
[--model_parameters MODEL_PARAMETERS] [--motif_length MOTIF_LENGTH]
[--no_tries NO_TRIES] [--output_prefix OUTPUT_PREFIX]
[--var_thr VAR_THR] [--batch_size BATCH_SIZE]
[--max_iterations MAX_ITERATIONS] [--no_cores NO_CORES]
sequences

## Example workflow
You can find the fasta files needed to run this example in `data` directory. Here we run BMF with one random parameter initialization. You can change the
`--no_tries` to increase the number of BMF runs with new initial parameter values. The best likelihood solution would be used in this case to plot the BMF logo, and to predict binding to new sequences.

### Motif discovery
You can use `bmf` in training mode for *de novo* motif discovery. By default, BMF runs over a maximum of 1000 iterations.

bmf positives_AAA_CCC.fasta --BGsequences negatives_AAA_CCC.fasta --input_type fasta --output_prefix AAA_CCC --motif_length 3 --no_tries 1

### Getting sequence logo
You can use `bmf_logo` to plot the best likelihood motif model generated by BMF. Specify the `output_prefix` from the previous step to allow `bmf_logo` to find all associated parameter files. Here we use `AAA_CCC` to specify the outputs from the previous run:

bmf_logo AAA_CCC --motif_length 3

### Predicting binding to new sequences
You can use the trained BMF model parameters to predict binding scores for new sequences. To specify `--model_parameters`, use the `output_prefix` from the first step (here `AAA_CCC`).

bmf test_sequences.fasta --predict --input_type fasta --model_parameters AAA_CCC --output_prefix predict_test_sequences