https://github.com/jespb/python-m3gp
An easy-to-use scikit-learn inspired implementation of the Multidimensional Multiclass Genetic Programming with Multidimensional Populations (M3GP) algorithm
https://github.com/jespb/python-m3gp
binary-classification classification clustering evolutionary-algorithm evolutionary-algorithms evolutionary-computation genetic-programming machine-learning multiclass-classification
Last synced: 4 months ago
JSON representation
An easy-to-use scikit-learn inspired implementation of the Multidimensional Multiclass Genetic Programming with Multidimensional Populations (M3GP) algorithm
- Host: GitHub
- URL: https://github.com/jespb/python-m3gp
- Owner: jespb
- Created: 2019-11-15T10:39:02.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2025-04-08T05:42:56.000Z (8 months ago)
- Last Synced: 2025-08-24T23:34:59.618Z (4 months ago)
- Topics: binary-classification, classification, clustering, evolutionary-algorithm, evolutionary-algorithms, evolutionary-computation, genetic-programming, machine-learning, multiclass-classification
- Language: Python
- Homepage:
- Size: 140 KB
- Stars: 9
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.txt
Awesome Lists containing this project
README
This is a, easy-to-use, scikit-learn inspired version of the M3GP algorithm.
By using this file, you are agreeing to this product's EULA
This product can be obtained in https://github.com/jespb/Python-M3GP
Copyright ©2019-2025 J. E. Batista
This file contains information about the command and flags used in the stand-alone version of this implementation and an explanation on how to import, use and edit this implementation.
This implementation of M3GP can be used in a stand-alone fashion using the following command and flags:
$ python Main_M3GP_standalone.py
[-d datasets]
- This flag expects a set of csv dataset names separated by ";" (e.g., "a.csv;b.csv")
- By default, the heart.csv dataset is used
[-dsdir dir]
- States the dataset directory.
- By default "datasets/" is used
- Use "-dsdir ./" for the root directory
[-es elite_size]
- This flag expects an integer with the elite size;
- By default, the elite has size 1.
[-md max_depth]
- This flag expects an integer with the maximum initial depth for the trees;
- By default, this value is set to 6.
[-mg max_generation]
- This flag expects an integer with the maximum number of generations;
- By default, this value is set to 100.
[-odir dir]
- States the output directory.
- By default "results/" is used
- Use "-odir ./" for the root directory
[-op operators]
- This flag excepts a set of operators and their number of arguments, separated by ";"
- Allowed operators: +,2 ; -,2 ; *,2 ; /,2
- By default, the used operators are the sum, subtraction, multiplication and protected division: "+,2;-,2;*,2;/,2"
[-ps population_size]
- This flag expects an integer with the size of the population;
- By default, this value is set to 500.
[-runs number_of_runs]
- This flag expects an integer with the number of runs to be made;
- By default, this values is set to 30
[-tf train_fraction]
- This flag expects a float [0;1] with the fraction of the dataset to be used in training;
- By default, this value is set to 0.70
[-ts tournament_size]
- This flag expects an integer with the tournament size;
- By default, this value is set to 10.
[-t number_of_threads]
- This flag expects an integer with the number of threads to use while evaluating the population;
- If the value is set to 1, the multiprocessing library will not be used
- By default, this value is set to 1.
[-di minimum_number_of_dimension]
- This flag expects an integer with the minimum number of dimensions in each individual;
- This flag affects the number of dimensions in the initial individuals;
- By default, this value is set to 1
[-dm maximum_number_of_dimension]
- This flag expects an integer with the maximum number of dimensions in each individual;
- By default, this value is set to 9999
[-rs random state]
- This flag expects an integer with the seed to be used by the M3GP algorithm;
- By default, this value is set to 42
How to import this implementation to your project:
- Download this repository;
- Copy the "m3gp/" directory to your project directory;
- import the M3GP class using "from m3gp.M3GP import M3GP".
Alternativaly, m3gp can be installed using:
- pip install m3gp
How to use this implementation:
$ from m3gp.M3GP import M3GP
$ model = M3GP()
$ model.fit( training_x, training_y, test_x (optional), test_y (optional) )
Arguments for M3GP():
operators -> Operators used by the individual (default: [("+",2),("-",2),("*",2),("/",2)] )
max_depth -> Max initial depths of the individuals (default: 6)
population_size -> Population size (default: 500)
max_generation -> Maximum number of generations (default: 100)
tournament_size -> Tournament size (default: 5)
elitism_size -> Elitism selection size (default: 1)
limit_depth -> Maximum individual depth (default: 17)
threads -> Number of CPU threads to be used (default: 1)
random_state -> Random state (default: 42)
model_class -> Model to be used as the inner classifier/regressor (default: MahalanobisDistanceClassifier() )
fitnessType -> Fitness to be used (Accuracy, WAF, 2FOLD - Classification, MSE - Regression) (default: "Accuracy") # "2FOLD" means 2-folds on the training data, using WAF
dim_min -> Minimum number of dimensions (default: 1)
dim_max -> Maximum number of dimensions (default: 9999) #The algorithm will not reach this value
Arguments for model.fit():
Tr_X -> Training samples
Tr_Y -> Training labels
Te_X -> Test samples, used in the standalone version (default: None)
Te_Y -> Test labels, used in the standalone version (default: None)
Useful methods:
$ model = M3GP() -> starts the model, the model will be optimized for the MahalanobisDistance classifier, a cluster-based algorithm;
$ model = M3GP(model_class = RandomForestClassifier(max_depth=6), fitnessType="2FOLD")
-> the model will be optimized for the RF classifier
-> Since RF models with no depth limit will memorize the training data, the models will obtain perfect accuracy in early generations,
-> to predent this, either limit the RF trees depth (e.g., max_depth=6), or use "2FOLD" as fitnessType;
$ model.fit(X, Y) -> fits the model to the dataset;
$ model.predict(X) -> Returns a list with the prediction of the given dataset.
How to edit this implementation:
Fitness Function ( m3gp.Individual ):
- Change the getFitness() method to use your own fitness function;
- This implementation assumes that a higher fitness is always better. To change this, edit the __gt__ method in this class;
- Warning: Since M3GP is a slow method, a fitness function that escalates well with the number of features is recommended.
Citation:
If you use this implementation, please cite one of the works below, where the implementation is also used:
@inproceedings{Batista2022,
doi = {10.1109/cec55065.2022.9870343},
url = {https://doi.org/10.1109/cec55065.2022.9870343},
year = {2022},
month = jul,
publisher = {{IEEE}},
author = {Joao E. Batista and Sara Silva},
title = {Comparative study of classifier performance using automatic feature construction by M3GP},
booktitle = {2022 {IEEE} Congress on Evolutionary Computation ({CEC})}
}
@Article{rs13091623,
AUTHOR = {Batista, João E. and Cabral, Ana I. R. and Vasconcelos, Maria J. P. and Vanneschi, Leonardo and Silva, Sara},
TITLE = {Improving Land Cover Classification Using Genetic Programming for Feature Construction},
JOURNAL = {Remote Sensing},
VOLUME = {13},
YEAR = {2021},
NUMBER = {9},
ARTICLE-NUMBER = {1623},
URL = {https://www.mdpi.com/2072-4292/13/9/1623},
ISSN = {2072-4292},
DOI = {10.3390/rs13091623}
}
@INPROCEEDINGS{9185630,
author={Batista, João E. and Silva, Sara},
booktitle={2020 IEEE Congress on Evolutionary Computation (CEC)},
title={Improving the Detection of Burnt Areas in Remote Sensing using Hyper-features Evolved by M3GP},
year={2020},
pages={1-8},
doi={10.1109/CEC48606.2020.9185630}
}
Reference:
Muñoz, L., Trujillo, L., & Silva, S. (2015). M3GP – multiclass classification with GP. In Genetic Programming - 18th European Conference, EuroGP 2015, Proceedings (Vol. 9025, pp. 78-91). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9025). Springer-Verlag. https://doi.org/10.1007/978-3-319-16501-1_7