https://github.com/mbelmadani/motifgp

Motif discovery for DNA sequences using multiobjective optimization and genetic programming.
https://github.com/mbelmadani/motifgp

bioinformatics chip-seq deap dna dna-sequences genetic-programming jaspar motif motif-discovery multiobjective-optimization network-expressions nsga-ii pareto-front python regular-expressions sequences strongly-typed transcription-factor-binding transcription-factors

Last synced: 11 days ago
JSON representation

Motif discovery for DNA sequences using multiobjective optimization and genetic programming.

Host: GitHub
URL: https://github.com/mbelmadani/motifgp
Owner: mbelmadani
License: lgpl-3.0
Created: 2016-08-09T07:10:20.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2018-07-24T21:20:28.000Z (almost 7 years ago)
Last Synced: 2025-05-07T09:14:57.392Z (11 days ago)
Topics: bioinformatics, chip-seq, deap, dna, dna-sequences, genetic-programming, jaspar, motif, motif-discovery, multiobjective-optimization, network-expressions, nsga-ii, pareto-front, python, regular-expressions, sequences, strongly-typed, transcription-factor-binding, transcription-factors
Language: Python
Homepage: https://mbelmadani.github.io/motifgp/
Size: 204 KB
Stars: 6
Watchers: 2
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.txt
- Changelog: CHANGELOG.txt
- License: LICENSE.txt

Awesome Lists containing this project

README

        ===============

= MotifGP 0.2 =

===============

MotifGP is a de novo motif discovery tool for discriminatory network expression identification in ChIP-seq datasets.

Original author: Manuel Belmadani

	[email protected]

The project is documented by the following publications.

Manuel Belmadani and Marcel Turcotte. MotifGP: Using multi-objective evolutionary computing for mining network expressions

in DNA sequences. In IEEE International Conference on Computational Intelligence in Bioinformatics and Computational Biology

(CIBCB 2016), Chiang Mai, Thailand, October, 5-7, 2016. 

https://doi.org/10.1109/CIBCB.2016.7758133

Manuel Belmadani. MotifGP: DNA motif discovery using multiobjective evolution. Master of computer science, 

University of Ottawa, School of Electrical Engineering and Computer Science, 2016. 

Available from University of Ottawa Research under: http://www.ruor.uottawa.ca/handle/10393/34213

Acknowledgements:

MotifGP is using source code from these tools:

-hypergeometric.py from the MEME Suite (License and copyright in source file).

-altschulEriksonDinuclShuffle.py from Peter Clote - CLOTE Computational Biology LAB, http://clavius.bc.edu/~clotelab/RNAdinucleotideShuffle/

This software was also made using the DEAP - Fortin, F.-A., De Rainville, F.-M., Gardner, M.-A. G., 

Parizeau, M. & Gagné, C. DEAP: Evolutionary Algorithms Made Easy. J. Mach. Learn. Res. 13, 

2171–2175 (2012).

=======================================================================================

License: (see LICENSE.txt)

=======================================================================================

Installation: (see INSTALL.txt)

=======================================================================================

Examples: (see EXAMPLES.txt)

=======================================================================================

Usage: motifgp.py [options]

Options:

  -h, --help            show this help message and exit

  -p TRAINING_PATH, --training=TRAINING_PATH

                        Fasta file to use for training (input) sequence data

  -b BACKGROUND_PATH, --background=BACKGROUND_PATH

                        [Optional] Fasta file to use for background (control)

                        sequence data. If not provided, a the generated

                        control sequences will be written to runtime_tmp/

  -m MOO, --moo=MOO     Multi-objective optimization [SPEA2, NSGA2, NSGAR,

                        MOEAD]. NSGAR is the NSGA-II_R (NSGA-II Revised)

                        algorithm improvement of NSGA2.

  -f FITNESS, --fitness=FITNESS

                        Objective fitness function. Available objectives: D=Di

                        scrimination,F=Fisher,I=ScipyFisher,O=OddsRatio,Q=Fals

                        eDiscoveryRate,S=Support,R=ScipyOddsRatio. Each single

                        character in the string represents an objective.

                        Objectives are mapped by the configuration file at

                        config/objectives. Default is 'DF' for

                        [Discrimination,Fisher] (2-objectives).

  --cxpb=CXPB           Probability [0.0 to 1.0] for a crossover during

                        variation. Requires --mutpb to be set to (1.0-cxpb).

                        Default is 0.7.

  --mutpb=MUTPB         Probability [0.0 to 1.0] for a mutation during

                        variation. Requires --cxpb to be set to (1.0-mutpb).

                        Default is 0.3.

  --short=SHORT         Stops reading in after  input sequences.

  --popsize=POPSIZE     Size of the population.

  --revcomp             Compile regex with reverse complement

  --random-seed=RANDOM_SEED

                        Random seed value to set for execution

  -n NGEN, --num-gen=NGEN

                        Generation where runtime stops (even in the case of

                        resumed checkpoints)

  --timelimit=TIMELIMIT

                        Time limit on the GP loop execution.

  --matcher=MATCHER     Use a different matcher. Options: 'grep', 'python'.

                        'grep' is faster on large datasets, while 'python' is

                        a pure python version in case the system doesn't

                        support grep.

  -o OUTPUT_PATH, --output=OUTPUT_PATH

                        Output directory. Default is ./OUT/

  -t TAG, --tag=TAG     A tag for the output subdirectory. Use to describes

                        the run and saves it in the tag's subdirectory in the

                        output directory. default is 'default'.

  -i, --inspector       Don't print any files. Can be useful with python -i

                        (interactive mode).

  --hardmask            Replace tandem repeats (lower-case typed nucleotides)

                        by N

  -g GRAMMAR, --grammar=GRAMMAR

                        Grammar for the STGP [min, iupac, full, ne]. Default

                        is iupac. 'min' only uses nucleotides. 'iupac' is a

                        network expression grammar. 'full' is a network

                        expression grammar with additional regular expression

                        tokens. 'ne' is like iupac, but built with string

                        primitives instead of booleans.

  -e ERASE, --erase=ERASE

                        Input .nef(t) file to delete from the dataset prior to

                        execution. Used for sequential coverage.

  --backpad             Pads background sequences with consecutive nucleotides

                        (ie. AAAAAAAA,CCCCCCCC,GGGGGGGG,TTTTTTTT) of length 8

                        every set of 4 sequences.

  --bg-algo=BG_ALGO     Shuffling algorithm for background. Default is

                        'dinuclShuffle', if no background dataset it provided.

                        Currently, dinuclShuffle is the only implemented

                        method.

  --ncpu=NCPU           Number of CPUs to use when mapping evaluation of

                        solutions. Use an integer, "auto" to automatically

                        dertmine the maximum number. Default is no

                        parallelism.

  --termination=TERMINATION

                        Use automatic termination algorithm. User 'auto' to

                        used the automatic termination algorithm for MOEAs.

  --hamming             [Experimental] Generates statistics on the hamming

                        distance from a template regex and hof candidates.

  --seeded-population   [Experimental] Use population seeds

  -c CHECKPOINT_PATH, --checkpoint=CHECKPOINT_PATH

                        [Temporarily disabled] Load a checkpoint at path.

  -q, --quiet           [Unimplemented] don't print status messages to stdout

Also consider looking at EXAMPLES.txt for basic examples of MotifGP usage.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mbelmadani/motifgp

Awesome Lists containing this project

README