{"id":15753565,"url":"https://github.com/marella/evaluate","last_synced_at":"2026-04-27T23:33:43.711Z","repository":{"id":66207399,"uuid":"228059852","full_name":"marella/evaluate","owner":"marella","description":"A tool to evaluate the performance of various machine learning algorithms and preprocessing steps to find a good baseline for a given task.","archived":false,"fork":false,"pushed_at":"2019-12-15T15:39:32.000Z","size":31,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-31T07:42:07.733Z","etag":null,"topics":["lightgbm","machine-learning","python","scikit-learn","xgboost"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/marella.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-12-14T17:15:47.000Z","updated_at":"2024-04-19T20:13:25.000Z","dependencies_parsed_at":"2023-06-19T16:56:03.473Z","dependency_job_id":null,"html_url":"https://github.com/marella/evaluate","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/marella/evaluate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marella%2Fevaluate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marella%2Fevaluate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marella%2Fevaluate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marella%2Fevaluate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/marella","download_url":"https://codeload.github.com/marella/evaluate/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marella%2Fevaluate/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32360110,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-27T20:07:02.737Z","status":"ssl_error","status_checked_at":"2026-04-27T20:07:00.910Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["lightgbm","machine-learning","python","scikit-learn","xgboost"],"created_at":"2024-10-04T07:41:11.388Z","updated_at":"2026-04-27T23:33:43.685Z","avatar_url":"https://github.com/marella.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"A tool to evaluate the performance of various machine learning algorithms and preprocessing steps to find a good baseline for a given task.\n\n## Installation\n\n```sh\npip install evaluate\n```\n\n## Example\n\n```py\nimport evaluate\nfrom sklearn import datasets\n\ndata = datasets.load_iris()\nx, y = data.data, data.target\n\nresults = evaluate(task='classification', data=(x, y))\nresults['test_score'].plot.bar()\n```\n\n![](results.png)\n\n## Documentation\n\nThis tool performs common preprocessing steps such as feature scaling, one-hot encoding etc., and runs various ML algorithms such as Random Forests, SVM etc. It then evaluates the performance of each preprocessing step and ML algorithm and provides scores for each. These results can be used to quickly identify preprocessing steps and ML algorithms that perform well to form a good baseline which can be used to develop better models.\n\n```py\nevaluate(task,\n         data,\n         test_data=.2,\n         columns=None,\n         preprocessors=None,\n         estimators=None)\n```\n\n###### Args\n\n-   `task`: `'classification'` or `'regression'`\n-   `data`: Tuple of `x, y` used for training the model\n-   `test_data`: Tuple of `x, y` or a number representing the proportion of `data` to be used for scoring the model\n-   `columns`: Dictionary of lists mapping column types to column names. If not specified numeric and categorical columns are automatically identified\n-   `preprocessors`: List of names of available preprocessors or a custom `Preprocessors` object\n-   `estimators`: List of names of available estimators or a custom `Estimators` object\n\n###### Returns\n\nDictionary of pandas DataFrames with estimator names as index and preprocessor names as column names with the following keys:\n\n```py\n{\n    'test_score': ...,\n    'train_score': ...,\n    'fit_time': ...,\n    'score_time': ...,\n}\n```\n\n```py\nresults = evaluate(...)\nassert isinstance(results, dict)\nscores = results['test_score']\nassert isinstance(scores, pandas.DataFrame)\nscores.plot.bar()\n```\n\n### Preprocessors\n\n#### Available Preprocessors\n\n| Name  | Column Type | Description                                              |\n| ----- | ----------- | -------------------------------------------------------- |\n| n     | numeric     | Handle missing data                                      |\n| n:s   | numeric     | Standardize features                                     |\n| c     | categorical | Handle missing data and perform one-hot encoding         |\n| o     | ordinal     | Handle missing data and perform ordinal encoding         |\n| t:c   | text        | Convert to a matrix of token counts                      |\n| t:c=2 | text        | Convert to a matrix of token counts including bigrams    |\n| t:t   | text        | Convert to a matrix of TF-IDF features                   |\n| t:t=2 | text        | Convert to a matrix of TF-IDF features including bigrams |\n\nMultiple preprocessors can be combined into one by separating them with `,`:\n\n```py\nresults = evaluate(..., preprocessors=['n,c,o', 'n:s,c,o'])\n```\n\n#### Custom Preprocessors\n\nCustom preprocessors can be added as:\n\n```py\nfrom evaluate import evaluate, Preprocessors\n\npreprocessors = Preprocessors()\npreprocessors.add('custom_preprocessor', CustomPreprocessor())\nresults = evaluate(..., preprocessors=preprocessors)\n```\n\nName of the custom preprocessor must be unique.\n\n### Estimators\n\n#### Available Estimators\n\n| Classification             | Regression                |\n| -------------------------- | ------------------------- |\n| XGBClassifier              | XGBRegressor              |\n| LGBMClassifier             | LGBMRegressor             |\n| RandomForestClassifier     | RandomForestRegressor     |\n| SVC                        | SVR                       |\n| LogisticRegression         | LinearRegression          |\n| KNeighborsClassifier       | KNeighborsRegressor       |\n| AdaBoostClassifier         | AdaBoostRegressor         |\n| ExtraTreesClassifier       | ExtraTreesRegressor       |\n| GradientBoostingClassifier | GradientBoostingRegressor |\n| DecisionTreeClassifier     | DecisionTreeRegressor     |\n| DummyClassifier            | DummyRegressor            |\n\n#### Custom Estimators\n\nCustom estimators can be added as:\n\n```py\nfrom evaluate import evaluate, Estimators\n\nestimators = Estimators(task='classification')\nestimators.add('custom_estimator', CustomEstimator())\nresults = evaluate(..., estimators=estimators)\n```\n\nName of the custom estimator must be unique.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarella%2Fevaluate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarella%2Fevaluate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarella%2Fevaluate/lists"}