{"id":30548495,"url":"https://github.com/jespb/python-m6gp","last_synced_at":"2026-02-14T15:02:36.787Z","repository":{"id":228831859,"uuid":"774457744","full_name":"jespb/Python-M6GP","owner":"jespb","description":"M6GP, a multi-objective feature engineering algorithm that aims to produce interpretable feature engineering models","archived":false,"fork":false,"pushed_at":"2025-12-03T03:42:15.000Z","size":93,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-26T19:48:41.661Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jespb.png","metadata":{"files":{"readme":"README.txt","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":"citation.bib","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-03-19T15:28:47.000Z","updated_at":"2025-12-03T03:42:20.000Z","dependencies_parsed_at":"2024-03-20T18:01:27.919Z","dependency_job_id":"6999393d-edbd-4e7d-bc69-9c53adfa087d","html_url":"https://github.com/jespb/Python-M6GP","commit_stats":null,"previous_names":["jespb/python-m6gp"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/jespb/Python-M6GP","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jespb%2FPython-M6GP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jespb%2FPython-M6GP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jespb%2FPython-M6GP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jespb%2FPython-M6GP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jespb","download_url":"https://codeload.github.com/jespb/Python-M6GP/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jespb%2FPython-M6GP/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29447768,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-14T14:10:32.461Z","status":"ssl_error","status_checked_at":"2026-02-14T14:09:49.945Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-28T03:08:07.304Z","updated_at":"2026-02-14T15:02:36.778Z","avatar_url":"https://github.com/jespb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"This is an easy-to-use, scikit-learn inspired version of the M6GP algorithm.\n\nAlso available through: `pip install m6gp`\n\nBy using this file, you are agreeing to this product's EULA\nThis product can be obtained at https://github.com/jespb/Python-M6GP\nCopyright ©2023-2025 J. E. Batista\n\n\nThis file contains information about the command and flags used in the stand-alone version of this implementation and an explanation of how to import, use, and edit this implementation.\n\n\n\n\nThis implementation of M6GP can be used in a stand-alone fashion using the following command and flags:\n\n$ python Main_M6GP_standalone.py\n\t\n    [-d datasets] \n        - This flag expects a set of csv dataset names separated by \";\" (e.g., \"a.csv;b.csv\")\n        - By default, the heart.csv dataset is used\t\t\n\n    [-dsdir dir] \n        - States the dataset directory. \n        - By default \"datasets/\" is used \n        - Use \"-dsdir ./\" for the root directory\t\n\n    [-md max_depth]\n        - This flag expects an integer with the maximum initial depth for the trees;\n        - By default, this value is set to 6.\t\t\n\n    [-mg max_generation]\n        - This flag expects an integer with the maximum number of generations;\n        - By default, this value is set to 100.\n\n    [-odir dir] \n        - States the output directory. \n        - By default, \"results/\" is used \n        - Use \"-odir ./\" for the root directory\n\t\n    [-op operators]\n        - This flag excepts a set of operators and their number of arguments, separated by \";\"\n        - Allowed operators: +,2 ; -,2 ; *,2 ; /,2\n        - By default, the used operators are the sum, subtraction, multiplication, and protected division: \"+,2;-,2;*,2;/,2\"\t\n\n    [-ps population_size]\n        - This flag expects an integer with the size of the population;\n        - By default, this value is set to 500.\n\n    [-runs number_of_runs] \n        - This flag expects an integer with the number of runs to be made;\n        - By default, this value is set to 30\n\t\n    [-tf train_fraction]\n        - This flag expects a float [0;1] with the fraction of the dataset to be used in training;\n        - By default, this value is set to 0.70\n\t\n    [-ts tournament_size]\n        - This flag expects an integer with the tournament size;\n        - By default, this value is set to 10.\n\n    [-t number_of_threads]\n        - This flag expects an integer with the number of threads to use while evaluating the population;\n        - If the value is set to 1, the multiprocessing library will not be used \n        - By default, this value is set to 1.\n\t\n    [-di minimum_number_of_dimension]\n        - This flag expects an integer with the minimum number of dimensions in each individual;\n        - This flag affects the number of dimensions in the initial individuals;\n        - By default, this value is set to 1\n\n    [-dm maximum_number_of_dimension]\n        - This flag expects an integer with the maximum number of dimensions in each individual;\n        - By default, this value is set to 9999\n\n    [-rs random state]\n        - This flag expects an integer with the seed to be used by the M6GP algorithm;\n        - By default, this value is set to 42\n\n\n\t\n\nHow to import this implementation to your project:\n    - Download this repository;\n    - Copy the \"m6gp/\" directory to your project directory;\n    - import the M6GP class using \"from m6gp.M6GP import M6GP\".\n\nHow to use this implementation:\n    $ from m6gp.M6GP import M6GP\n    $ model = M6GP()\n    $ model.fit( training_x, training_y, test_x (optional), test_y (optional) )\n\nArguments for M6GP():\n    operators\t\t-\u003e Operators used by the individual (default: [(\"+\",2),(\"-\",2),(\"*\",2),(\"/\",2)] )\n    max_depth\t\t-\u003e Max initial depths of the individuals (default: 6)\n    population_size\t-\u003e Population size (default: 500)\n    max_generation\t-\u003e Maximum number of generations (default: 100)\n    tournament_size\t-\u003e Tournament size (default: 5)\n    limit_depth\t\t-\u003e Maximum individual depth (default: 17)\n    threads \t\t-\u003e Number of CPU threads to be used (default: 1)\n    random_state\t-\u003e Random state (default: 42)\n    dim_min\t\t-\u003e Minimum number of dimensions (default: 1)\n    dim_max\t\t-\u003e Maximum number of dimensions (default: 9999) #The algorithm will not reach this value\n\nArguments for model.fit():\n    Tr_X \t\t-\u003e Training samples\n    Tr_Y \t\t-\u003e Training labels\n    Te_X \t\t-\u003e Test samples, used in the standalone version (default: None)\n    Te_Y \t\t-\u003e Test labels, used in the standalone version (default: None)\n\nUseful methods:\n    $ model = M6GP()\t-\u003e starts the model;\n    $ model.fit(X, Y)\t-\u003e fits the model to the dataset;\n    $ model.predict(X)\t-\u003e Returns a list with the prediction of the given dataset.\n\n\n\n\nHow to edit this implementation:\n    Fitness Function ( m6gp.Individual ):\n        - Change the getFitness() method to use your own fitness function;\n        - This implementation assumes that a higher fitness is always better. To change this, edit the __gt__ method in this class;\n        - Warning: Since M6GP is a slow method, a fitness function that escalates well with the number of features is recommended. \n\n    Classification/regression algorithm ( m6gp.Individual ):\n        - Change the createModel() method to use your own classifier;\n        - Assuming it is a scykit-learn implementation, you may only need to change one line in this method;\n        - Warning: Since M6GP is a slow method, a learning algorithm that escalates well with the number of features is recommended.\n\n   \n\n\nReference: \n\t\n\t@inproceedings{m6gp,\n\tdoi = {10.1109/CEC60901.2024.10612107},\n\turl = {https://ieeexplore.ieee.org/abstract/document/10612107},\n\tyear = {2024},\n\tmonth = jun,\n\tpublisher = {{IEEE}},\n\tauthor = {Joao E. Batista and Nuno M. Rodrigues and Leonardo Vanneschi},\n\ttitle = {{M6GP: Multiobjective Feature Engineering}},\n\tbooktitle = {2024 {IEEE} Congress on Evolutionary Computation ({CEC})}\n\t}\n\n\nYou may also be interested in other works related to measuring the complexity of feature engineering models:\n - https://ieeexplore.ieee.org/abstract/document/10611989\n - https://www.sciencedirect.com/science/article/pii/S2210650224002992\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjespb%2Fpython-m6gp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjespb%2Fpython-m6gp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjespb%2Fpython-m6gp/lists"}