{"id":33158631,"url":"https://github.com/auto-flow/auto-flow","last_synced_at":"2026-04-04T14:05:07.438Z","repository":{"id":57412759,"uuid":"256188087","full_name":"auto-flow/auto-flow","owner":"auto-flow","description":"AutoFlow : Automatic machine learning workflow modeling platform","archived":false,"fork":false,"pushed_at":"2022-01-22T08:39:13.000Z","size":12877,"stargazers_count":68,"open_issues_count":0,"forks_count":7,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-11-20T14:10:39.808Z","etag":null,"topics":["automl","catboost","data-minig","data-sicence","lightgbm","machine-learning","workflow"],"latest_commit_sha":null,"homepage":"https://auto-flow.github.io/auto-flow/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/auto-flow.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-16T10:56:00.000Z","updated_at":"2025-08-07T12:25:09.000Z","dependencies_parsed_at":"2022-09-26T17:11:19.740Z","dependency_job_id":null,"html_url":"https://github.com/auto-flow/auto-flow","commit_stats":null,"previous_names":["auto-flow/autoflow"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/auto-flow/auto-flow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/auto-flow%2Fauto-flow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/auto-flow%2Fauto-flow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/auto-flow%2Fauto-flow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/auto-flow%2Fauto-flow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/auto-flow","download_url":"https://codeload.github.com/auto-flow/auto-flow/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/auto-flow%2Fauto-flow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31402277,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automl","catboost","data-minig","data-sicence","lightgbm","machine-learning","workflow"],"created_at":"2025-11-15T21:00:27.012Z","updated_at":"2026-04-04T14:05:07.429Z","avatar_url":"https://github.com/auto-flow.png","language":"Python","funding_links":[],"categories":["Libraries"],"sub_categories":[],"readme":"==========\nAutoFlow\n==========\n\n``AutoFlow`` : **automatic machine learning workflow modeling platform**\n\n\nIntroduction\n--------------\n\nIn the problem of data mining and machine learning of tabular data,\ndata scientists usually group the features, construct a directed acyclic graph (DAG),\nand form a machine learning workflow.\n\nIn each directed edge of this directed acyclic graph, \nthe tail node represents the feature group before preprocessing, \nand the head node represents the feature group after preprocessing. \nEdge representation data processing or feature engineering algorithms, \nin each edge algorithm selection and hyper-parameter optimization are doing.\n\nUnfortunately, if data scientists want to manually select algorithms and \nhyper-parameters for such a workflow, \nit will be a very tedious task. In order to solve this problem, \nwe developed the ``AutoFlow``,\nwhich can automatically select algorithm and optimize the parameters of \nmachine learning workflow. \nIn other words, it can implement AutoML for tabular data.\n\n.. image:: docs/images/workflow_space.png\n\n\nDocumentation\n--------------\n\nThe documentation can be found `here \u003chttps://auto-flow.github.io/auto-flow/\u003e`_.\n\nInstallation\n--------------\n\nRequirements\n~~~~~~~~~~~~~~\n\nThis project is built and test on Linux system, so Linux platform is required. \nIf you are using Windows system, `WSL \u003chttps://docs.microsoft.com/en-us/windows/wsl/install-win10\u003e`_ is worthy of considerarion.\n\nBesides the listed requirements (see requirements.txt), the `random forest \u003chttps://github.com/automl/random_forest_run\u003e`_ \nused in `SMAC3 \u003chttps://github.com/automl/SMAC3\u003e`_ requires \n`SWIG \u003chttp://www.swig.org/\u003e`_ (\u003e= 3.0, \u003c4.0) as a build dependency. \nIf you are using Ubuntu or another Debain Linux, you can enter following command :\n\n::\n\n    apt-get install swig\n\nOn Arch Linux (or any distribution with swig4 as default implementation):\n\n::\n\n    pacman -Syu swig3\n    ln -s /usr/bin/swig-3 /usr/bin/swig\n\nAutoFlow requires `Python \u003chttps://www.python.org/\u003e`_ 3.6 or higher.\n\nInstallation via pip\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n::\n\n    pip install auto-flow\n\n\nManual Installation\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n::\n\n    git clone https://github.com/auto-flow/autoflow.git \u0026\u0026 cd autoflow\n    python setup.py install\n\nQuick Start\n--------------\n\n`Titanic \u003chttps://www.kaggle.com/c/titanic\u003e`_ is perhaps the most familiar machine learning task for data scientists. \nFor tutorial purposes, you can find titanic dataset in ``examples/data/train_classification.csv`` and\n``examples/data/test_classification.csv`` . \nYou can use AutoFlow to finish this ML task instead of manually exploring all the features of the dataset. DO IT !\n\n.. code-block:: bash\n\n   $ cd examples/classification\n\n.. code-block:: python\n\n    import os\n\n    import joblib\n    import pandas as pd\n    from sklearn.model_selection import KFold\n\n    from autoflow import AutoFlowClassifier\n\n    # load data from csv file\n    train_df = pd.read_csv(\"../data/train_classification.csv\")\n    test_df = pd.read_csv(\"../data/test_classification.csv\")\n    # initial_runs  -- initial runs are totally random search, to provide experience for SMAC algorithm.\n    # run_limit     -- is the maximum number of runs.\n    # n_jobs        -- defines how many search processes are started.\n    # included_classifiers -- restrict the search space . lightgbm is the only classifier that needs to be selected\n    # per_run_time_limit -- restrict the run time. if a trial during 60 seconds, it is expired, should be killed.\n    trained_pipeline = AutoFlowClassifier(initial_runs=5, run_limit=10, n_jobs=1, included_classifiers=[\"lightgbm\"],\n                                        per_run_time_limit=60)\n    # describing meaning of columns. `id`, `target` and `ignore` all has specific meaning\n    # `id` is a column name means unique descriptor of each rows,\n    # `target` column in the dataset is what your model will learn to predict\n    # `ignore` is some columns which contains irrelevant information\n    column_descriptions = {\n        \"id\": \"PassengerId\",\n        \"target\": \"Survived\",\n        \"ignore\": \"Name\"\n    }\n    if not os.path.exists(\"autoflow_classification.bz2\"):\n        # pass `train_df`, `test_df` and `column_descriptions` to classifier,\n        # if param `fit_ensemble_params` set as \"auto\", Stack Ensemble will be used\n        # ``splitter`` is train-valid-dataset splitter, in here it is set as 3-Fold Cross Validation\n        trained_pipeline.fit(\n            X_train=train_df, X_test=test_df, column_descriptions=column_descriptions,\n            fit_ensemble_params=False,\n            splitter=KFold(n_splits=3, shuffle=True, random_state=42),\n        )\n        # finally , the best model will be serialize and store in local file system for subsequent use\n        joblib.dump(trained_pipeline, \"autoflow_classification.bz2\")\n        # if you want to see what the workflow AutoFlow is searching, you can use `draw_workflow_space` to visualize\n        hdl_constructor = trained_pipeline.hdl_constructors[0]\n        hdl_constructor.draw_workflow_space()\n    # suppose you are processing predict procedure, firstly, you should load serialized model from file system\n    predict_pipeline = joblib.load(\"autoflow_classification.bz2\")\n    # secondly, use loaded model to do predicting\n    result = predict_pipeline.predict(test_df)\n    print(result)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fauto-flow%2Fauto-flow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fauto-flow%2Fauto-flow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fauto-flow%2Fauto-flow/lists"}