{"id":14958403,"url":"https://github.com/bartekpog/modelcreator","last_synced_at":"2025-10-24T14:31:55.038Z","repository":{"id":56138636,"uuid":"252663584","full_name":"BartekPog/modelcreator","owner":"BartekPog","description":"Simple python package for creating predictive models","archived":false,"fork":false,"pushed_at":"2024-06-17T23:31:48.000Z","size":366,"stargazers_count":6,"open_issues_count":7,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-31T02:21:15.957Z","etag":null,"topics":["automl","estimator","machine-learning","model-selection","package","predictive-modeling","python","sklearn"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/modelcreator/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BartekPog.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-03T07:37:39.000Z","updated_at":"2023-01-13T12:24:03.000Z","dependencies_parsed_at":"2024-09-22T07:00:25.025Z","dependency_job_id":null,"html_url":"https://github.com/BartekPog/modelcreator","commit_stats":{"total_commits":27,"total_committers":2,"mean_commits":13.5,"dds":0.03703703703703709,"last_synced_commit":"902bca229352c6fc10c395b1606510b9bea04b1a"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BartekPog%2Fmodelcreator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BartekPog%2Fmodelcreator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BartekPog%2Fmodelcreator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BartekPog%2Fmodelcreator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BartekPog","download_url":"https://codeload.github.com/BartekPog/modelcreator/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237990576,"owners_count":19398453,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automl","estimator","machine-learning","model-selection","package","predictive-modeling","python","sklearn"],"created_at":"2024-09-24T13:16:58.499Z","updated_at":"2025-10-24T14:31:50.088Z","avatar_url":"https://github.com/BartekPog.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# modelcreator - AutoML package\n\nThis package contains a **Machine** which is meant to do the **learning** for you. It can automaticly create a fitting predictive model for given data.\n\n###### Sample output\n\n```\nTesting:  Gradient Boosting Classifier\n[########################################] | 100% Completed |  3.9s\nScore: 0.9667\n\nTesting:  Ada Boost Classifier\n[########################################] | 100% Completed |  1.3s\nScore: 0.9600\n\nTesting:  Random Forest Classifier\n[########################################] | 100% Completed |  5.0s\nScore: 0.9600\n\nTesting:  Balanced Random Forest Classifier\n[########################################] | 100% Completed |  3.5s\nScore: 0.9600\n\nTesting:  SVC\n[########################################] | 100% Completed |  1.2s\nScore: 0.9667\n\nChosen model:  Gradient Boosting Classifier 0.9667\n\nParams:\n        min_samples_split: 2\n        n_estimators: 100\n\nResults saved to  output.csv\n```\n\n# Table of Contents\n\n1. [Installation](#installation)\n1. [Usage](#usage)\n   - [CSV input](#csv-path-input)\n   - [Pandas input](#pandas-input)\n1. [Saving model](#saving-the-model)\n1. [Parameters](#parameters)\n   - [Machine](#machine)\n   - [learn](#learn)\n   - [learnFromDf](#learnfromdf)\n   - [predict](#predict)\n   - [predictFromDf](#predictfromdf)\n   - [saveMachine](#savemachine)\n1. [Development](#development)\n\n## Installation\n\nTo use the package run:\n\n```bash\npip install modelcreator\n```\n\n## Usage\n\nThe input may be either a path to a **csv** file or a **pandas DataFrame** object.\n\n#### CSV path input\n\nThe library assumes that the last column of the training dataset contains the expected results. The dataset (both training and predictive) must be provided as a **csv** file.\n\nIf the results column contains text the _Machine_ will do its best to learn to _classify_ the data correctly. In case of a number inside, _regression_ will be performed.\n\nIf the file contains headers you shall add `header_in_csv=True` parameter to the method.\n\n###### Example 1 _Iris_\n\n```python\nfrom modelcreator import Machine\n\n# Create automl machine instance\nmachine = Machine()\n\n# Train machine learning model\nmachine.learn('example-data/iris.csv')\n\n# Predict the outcomes\nmachine.predict('example-data/iris-pred.csv', 'output.csv')\n```\n\nThis example is also available in the `example.py` file. Consider trying it on your own.\n\n#### Pandas input\n\nBut what to do if a result column is not the last in the given csv? It may be inconvenient to rewrite the whole csv just to swap the columns. Because of this problem Machine has `learnFromDf` and `predictFromDf` methods. The _Df_ in method names stands for _DataFrame_ from _pandas_ module. This way you can handle reading the file by yourself.\n\n###### Example 2 _Titanic_\n\n```python\nfrom modelcreator import Machine\nimport pandas as pd\n\n# Create DataFrame object from file\ntrain = pd.read_csv(\"train.csv\")\n\n# Get features columns from DataFrame\nX_train = train.drop(['Survived'], axis=1)\n\n# And labels (results) column\ny_train = train[\"Survived\"].astype(str)\n\n# Create the instance of Machine\nmachine = Machine()\n\n# Train machine learning model\nmachine.learnFromDf(X_train, y_train, computation_level='advanced')\n\n# Show parameters of the model\nmachine.showParams()\n\n# Load test set from file\nX_test = pd.read_csv(\"test.csv\")\n\n# Predict the labels\nresults = machine.predictFromDf(X_test)\n\n# Save results to a new file\nresults.to_csv(\"results.csv\")\n```\n\nSimple? That's right! Just note that we used `astype(str)` in order to treat data as **classes**, not **numbers** because the [Titanic dataset](https://www.kaggle.com/c/titanic) used in the example above has values _0_ and _1_ in `\"Survived\"` column to indicate whether a person made it through the disaster.\n\n#### Saving the model\n\nIf you want your model to avoid re-learning on the whole dataset just to make a simple prediction you can save the state of _Machine_ to a file.\n\n```python\n# Save Machine with a trained model to \"machine.pkl\"\nmachine.saveMachine('machine.pkl')\n\n# Create a new machine based on a schema file\nmachine2 = Machine('machine.pkl')\n```\n\n#### Parameters\n\nThe **Machine** can be customized according to the use case. Check the parameters table:\n\n###### Machine\n\n| Param  | Type            | Default | Description                                                                                                                                           |\n| ------ | --------------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |\n| schema | _None_ or _str_ | `None`  | A Machine may be created based on a saved, pre-trained machine instance. You may specify the path to the saved instance in this param to recreate it. |\n\n###### learn\n\n| Param             | Type                        | Default                                         | Description                                                                                                                                                                                                                               |\n| ----------------- | --------------------------- | ----------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| dataset_file      | _str_                       |                                                 | Path to a csv file which contains training dataset.                                                                                                                                                                                       |\n| header_in_csv     | _bool_                      | `False`                                         | Whether the csv file contains _headers_ in the first row.                                                                                                                                                                                 |\n| metrics           | _None_, _str_ or _Callable_ | `'accuracy'` or `'neg_root_mean_squared_error'` | Metrics used for scoring estimators. Many popular scoring functions (such as _f1_, _roc_auc_, _neg_mean_gamma_deviance_). See [here](https://scikit-learn.org/stable/modules/model_evaluation.html) how to make custom scoring functions. |\n| verbose           | _bool_                      | `True`                                          | Whether to print learning logs.                                                                                                                                                                                                           |\n| cv                | _int_                       | `3`                                             | a Number of cross-validation subsets. Higher values may increase computation time.                                                                                                                                                        |\n| computation_level | _str_                       | `'medium'`                                      | Can be either `'basic'`, `'medium'` or `'advanced'`. With higher computation level more models and parameters are being tested.                                                                                                           |\n\n###### learnFromDf\n\n| Param             | Type                        | Default                                         | Description                                                                                                                                                                                                                               |\n| ----------------- | --------------------------- | ----------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| X                 | _pandas.DataFrame_          |                                                 | DataFrame containing the feature columns.                                                                                                                                                                                                 |\n| y                 | _pandas.Series_             |                                                 | Label columns of the training data.                                                                                                                                                                                                       |\n| metrics           | _None_, _str_ or _Callable_ | `'accuracy'` or `'neg_root_mean_squared_error'` | Metrics used for scoring estimators. Many popular scoring functions (such as _f1_, _roc_auc_, _neg_mean_gamma_deviance_). See [here](https://scikit-learn.org/stable/modules/model_evaluation.html) how to make custom scoring functions. |\n| verbose           | _bool_                      | `True`                                          | Whether to print learning logs.                                                                                                                                                                                                           |\n| cv                | _int_                       | `3`                                             | A number of cross-validation subsets. Higher values may increase computation time.                                                                                                                                                        |\n| computation_level | _str_                       | `'medium'`                                      | Can be either `'basic'`, `'medium'` or `'advanced'`. With higher computation level more models and parameters are being tested.                                                                                                           |\n\n###### predict\n\n| Param         | Type   | Default        | Description                                                                   |\n| ------------- | ------ | -------------- | ----------------------------------------------------------------------------- |\n| features_file | _str_  |                | Path to the features **csv** of the data to generate predictions on.          |\n| header_in_csv | _bool_ | `False`        | Whether the csv file contains _headers_ in the first row.                     |\n| output_file   | _str_  | `'output.csv'` | Path to the output **csv** file. In this file, the predictions will be saved. |\n| verbose       | _str_  | `True`         | Whether to print logs.                                                        |\n\n###### predictFromDf\n\n| Param         | Type               | Default | Description                                                                                                                                                                                                                          |\n| ------------- | ------------------ | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |\n| X_predictions | _pandas.DataFrame_ |         | Features columns to generate predictions on.                                                                                                                                                                                         |\n| output_file   | _str_              | `None`  | Predict method returns _pandas.Series_ of the results. Additionally, it can also save the results to a **csv** file. It can be specified here. If the path is other than `None` it will be interpreted as a path to the output file. |\n| verbose       | _str_              | `True`  | Whether to print logs.                                                                                                                                                                                                               |\n\n###### saveMachine\n\n| Param            | Type  | Default         | Description                                        |\n| ---------------- | ----- | --------------- | -------------------------------------------------- |\n| output_file_name | _str_ | `'machine.pkl'` | Path to where shall the Machine instance be saved. |\n\n### Development\n\nHave a feature idea or just want to help? Take a look at the [issues tab](https://github.com/BartekPog/modelcreator/issues)!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbartekpog%2Fmodelcreator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbartekpog%2Fmodelcreator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbartekpog%2Fmodelcreator/lists"}