{"id":20313487,"url":"https://github.com/autogluon/autogluon-bench","last_synced_at":"2025-04-11T17:10:26.149Z","repository":{"id":161947799,"uuid":"618097646","full_name":"autogluon/autogluon-bench","owner":"autogluon","description":null,"archived":false,"fork":false,"pushed_at":"2024-12-23T14:20:38.000Z","size":410,"stargazers_count":7,"open_issues_count":11,"forks_count":13,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-01-19T22:13:50.093Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/autogluon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-23T18:44:05.000Z","updated_at":"2024-12-19T23:44:13.000Z","dependencies_parsed_at":"2023-12-20T11:25:40.473Z","dependency_job_id":"8433eda0-fd23-4e42-b9ff-db8ae294b611","html_url":"https://github.com/autogluon/autogluon-bench","commit_stats":null,"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autogluon%2Fautogluon-bench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autogluon%2Fautogluon-bench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autogluon%2Fautogluon-bench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autogluon%2Fautogluon-bench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/autogluon","download_url":"https://codeload.github.com/autogluon/autogluon-bench/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235524155,"owners_count":19003815,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T18:11:16.618Z","updated_at":"2025-04-11T17:10:26.143Z","avatar_url":"https://github.com/autogluon.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"left\"\u003e\n  \u003cimg src=\"https://user-images.githubusercontent.com/16392542/77208906-224aa500-6aba-11ea-96bd-e81806074030.png\" width=\"350\"\u003e\n\u003c/div\u003e\n\n# AutoGluon-Bench\n\nWelcome to AutoGluon-Bench, a suite for benchmarking your AutoML frameworks.\n\n## Setup\n\nFollow the steps below to set up autogluon-bench:\n\n```bash\n# create virtual env and update pip\npython3 -m venv .venv_agbench\nsource .venv_agbench/bin/activate\npython3 -m pip install --upgrade pip\n```\n\nInstall `autogloun-bench` from PyPI:\n\n```bash\npython3 -m pip install autogluon.bench\n```\n\nInstall `autogluon-bench` from source for development:\n\n```bash\ngit clone https://github.com/autogluon/autogluon-bench.git\ncd autogluon-bench\n\n# install from source in editable mode\npip install -e \".[tests]\"\n```\n\n## Run benchmarks locally\n\nTo run the benchmarks on your local machine, use the following command:\n\n```\nagbench run path/to/local_config_file\n```\n\nCheck out our [sample local configuration files](https://github.com/autogluon/autogluon-bench/blob/master/sample_configs) for local runs.\n\nThe results are stored in the following directory: `{WORKING_DIR}/{root_dir}/{module}/{benchmark_name}_{timestamp}`.\n\n### Tabular and Timeseries Benchmark\n\nTo perform tabular or timeseries benchmarking, set the module to 'tabular' or 'timeseries'. You must set both Benchmark Configurations and Tabular/Timeseries Specific configurations, and each should have a single value. Refer to the [sample configuration file](https://github.com/autogluon/autogluon-bench/blob/master/sample_configs/tabular_local_configs.yaml) for more details.\n\nThe tabular/timeseires module leverages the [AMLB](https://github.com/openml/automlbenchmark) benchmarking framework. Required and optional AMLB arguments are specified via the configuration file mentioned previously.\n\nCustom configuration is supported by providing a local directory to `amlb_user_dir` in the config, by which custom frameworks, constraints and datasets can be overriden. We have a minimum working [custom config](https://github.com/autogluon/autogluon-bench/blob/master/sample_configs/amlb_configs) setup for benchmarking on a custom framework (an `AutoGluon` dev branch). In the [sample configuration file](https://github.com/autogluon/autogluon-bench/blob/master/sample_configs/tabular_local_configs.yaml), change the following field to:\n\n```\nframework: AutoGluon_dev:example\namlb_user_dir: path_to/sample_configs/amlb_configs\n```\n\nFor more customizations, please follow the [example custom configuration folder](https://github.com/openml/automlbenchmark/tree/master/examples/custom) provided by AMLB and their [documentation](https://openml.github.io/automlbenchmark/docs/using/configuration/#custom-configurations).\n\n### Multimodal Benchmark\n\nFor multimodal benchmarking, set the module to multimodal. Note that multimodal benchmarking directly calls the MultiModalPredictor, bypassing the extra layer of [AMLB](https://github.com/openml/automlbenchmark). Therefore, the required arguments are different from those for tabular or timeseries. Please refer to the [sample multimodal local run configuration file](https://github.com/autogluon/autogluon-bench/blob/master/sample_configs/multimodal_local_configs.yaml).\n\nWe also support customizations on benchmarking framework, datasets, and metrics by providing `custom_resource_dir`, `custom_dataloader`, `custom_metrics`.\n\nTo define custom frameworks, you can follow the [examples](https://github.com/autogluon/autogluon-bench/tree/master/sample_configs/resources/multimodal_frameworks.yaml).\n\n1. Create a folder under working directory, e.g. `custom_resources/`\n2. Create a yaml file named `multimodal_frameworks.yaml`\n3. Add an entry to the file with `repo` as the GitHub URL, `version` as the branch or tag name, `params` to be used by `MultiModalPredictor`.\n4. Add `custom_resource_dir: custom/resources/` in the run configuration file.\n\nTo add more datasets to your benchmarking jobs. We support custom datasets with custom defined data loaders. Follow these steps:\n\n1. Create a folder under the working directory, e.g. `custom_dataloader/`\n2. Create a dataset yaml file, `custom_dataloader/datasets.yaml` which includes all required properties for your problem type, please refer to the [function](https://github.com/autogluon/autogluon-bench/blob/52eee491018f6281236416f4b1bece14b88610e8/src/autogluon/bench/frameworks/multimodal/exec.py#L100-L201).\n3. Create a dataset loader class, `custom_dataloader/dataloader.py`, which downloads and loads the dataset as a dataframe. Please set the required properties as mentioned above.\n4. Add `custom_dataloader` in the `agbench run` configuration, where `dataloader_file`, `class_name` and `dataset_config_file` are required.\n5. Make sure you have the proper permission to download the dataset. If running in `AWS mode`, we support downloading from the S3 bucket specified as `DATA_BUCKET` in the `agbench run` configuration under the same AWS Batch deployment account.\n\nPlease refer to [here](https://github.com/autogluon/autogluon-bench/tree/master/sample_configs/dataloaders) for more examples.\n\nAdding custom metrics is similar as adding data loaders. Internally, we convert the custom metrics into an [AutoGluon Scorer](https://auto.gluon.ai/stable/tutorials/tabular/advanced/tabular-custom-metric.html) using the `autogluon.core.metrics.make_scorer` function. Follow these steps to set up:\n\n1. Create a folder under the working directory, e.g. `custom_metrics/`\n2. Create a metrics script, `custom_metrics/metrics.py` which has a function defined that returns a metrics score.\n3. Add `custom_metrics` in the `agbench run` configuration, where `metrics_path`, `function_name` are required. Aditional arguments can be added for the [make_scorer](https://github.com/autogluon/autogluon/blob/a33cc0e084c82cb207c6b98b13b49c1a377f3f0d/core/src/autogluon/core/metrics/__init__.py#L333-L335) function.\n\nPlease refer to [here](https://github.com/autogluon/autogluon-bench/tree/master/sample_configs/custom_metrics) for more examples.\n\n## Run benchmarks on AWS\n\nAutoGluon-Bench uses the AWS CDK to build an AWS Batch compute environment for benchmarking.\n\nTo get started, install [Node.js](https://nodejs.org/) and [AWS CDK](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_install) with the following instructions:\n\n1. Install [Node Version Manager](https://github.com/nvm-sh/nvm#installing-and-updating).\n2. Source profile or restart the terminal.\n3. Follow the `Prerequisites` section on the [AWS CDK Guide](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html) and install an appropriate `Node.js` version for your system:\n\n```bash\nnvm install $VERSION  # install Node.js\nnpm install -g aws-cdk  # install aws-cdk\ncdk --version  # verify the installation, you might need to update the Node.js version depending on the log.\n```\n\n4. Follow the [AWS CLI Installation Guide](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) to install `awscliv2`.\n\nIf it is the first time using CDK to deploy to an AWS environment (An AWS environment is a combination of an AWS account and Region), please run the following:\n\n```bash\ncdk bootstrap aws://CDK_DEPLOY_ACCOUNT/CDK_DEPLOY_REGION\n```\n\nYou will need a cloud configuration file to run the benchmarks. You can edit the provided [sample cloud config files](https://github.com/autogluon/autogluon-bench/blob/master/sample_configs), or use the CLI tool to generate the cloud config files locally.\n\nFor multimodal:\n\n```\nagbench generate-cloud-config --module multimodal --cdk-deploy-account \u003cAWS_ACCOUNT_ID\u003e --cdk-deploy-region \u003cAWS_ACCOUNT_REGION\u003e --prefix \u003cPREFIX\u003e --metrics-bucket \u003cMETRICS_BUCKET\u003e --data-bucket \u003cDATA_BUCKET\u003e --dataset-names DATASET_1,DATASET_2 --custom-resource-dir \u003cCUSTOM_RESOURCE_DIR\u003e --custom-dataloader \"dataloader_file:value1;class_name:value2;dataset_config_file:value3\"\n```\n\nFor tabular or timeseries:\n\n```\nagbench generate-cloud-config --module \u003cMODULE\u003e --cdk-deploy-account \u003cAWS_ACCOUNT_ID\u003e --cdk-deploy-region \u003cAWS_ACCOUNT_REGION\u003e --prefix \u003cPREFIX\u003e --metrics-bucket \u003cMETRICS_BUCKET\u003e --git-uri-branch \u003cAMLB_GIT_URI_BRANCH\u003e --framework \u003cAMLB_FRAMEWORK\u003e --amlb-benchmark \u003cBENCHMARK1\u003e,\u003cBENCHMARK2\u003e --amlb-task \"BENCHMARK1:DATASET1,DATASET2;BENCHMARK2:DATASET3\" --amlb-constraint \u003cCONSTRAINT\u003e --amlb-fold-to-run \"BENCHMARK1:DATASET1:fold1/fold2,DATASET2:fold1/fold2;BENCHMARK1:DATASET3:fold1/fold2\" --amlb-user-dir \u003cAMLB_USER_DIR\u003e\n```\n\nFor more details, you can run\n\n```\nagbench generate-cloud-config --help\n```\n\nAfter having the configuration file ready, use the command below to initiate benchmark runs on cloud:\n\n```\nagbench run /path/to/cloud_config_file\n```\n\nThis command automatically sets up an AWS Batch environment using instance specifications defined in the [cloud config files](https://github.com/autogluon/autogluon-bench/tree/master/sample_configs). It also creates a lambda function named with your chosen `LAMBDA_FUNCTION_NAME`. This lambda function is automatically invoked with the cloud config file you provided, submitting a single AWS Batch job or a parent job for [Array jobs](https://docs.aws.amazon.com/batch/latest/userguide/array_jobs.html) to the job queue (named with the `PREFIX` you provided).\n\nIn order for the Lambda function to submit multiple Array child jobs simultaneously, you need to specify a list of values for each module-specific key. Each combination of configurations is saved and uploaded to your specified `METRICS_BUCKET` in S3, stored under `S3://{METRICS_BUCKET}/configs/{module}/{BENCHMARK_NAME}_{timestamp}/{BENCHMARK_NAME}_split_{UID}.yaml`. Here, `UID` is a unique ID assigned to the split.\n\nThe AWS infrastructure configurations and submitted job ID is saved locally at `{WORKING_DIR}/{root_dir}/{module}/{benchmark_name}_{timestamp}/aws_configs.yaml`. You can use this file to check the job status at any time:\n\n```bash\nagbench get-job-status --config-file /path/to/aws_configs.yaml\n```\n\nYou can also check the job status using job IDs:\n\n```bash\nagbench get-job-status --job-ids JOB_ID_1 --job-ids JOB_ID_2 —cdk_deploy_region AWS_REGION\n\n```\n\nJob logs can be viewed on the AWS console. Each job has an `UID` attached to the name, which you can use to identify the respective config split. After the jobs are completed and reach the `SUCCEEDED` status in the job queue, you'll find metrics saved under `S3://{METRICS_BUCKET}/{module}/{benchmark_name}_{timestamp}/{benchmark_name}_{timestamp}_{UID}`.\n\nA cloud configuration file with time-stamped `benchmark_name` is also saved under `{WORKING_DIR}/{root_dir}/{module}/{benchmark_name}_{timestamp}/{module}_cloud_configs.yaml`\n\nBy default, the infrastructure created is retained for future use. To automatically remove resources after the run, use the `--remove_resources` option:\n\n```bash\nagbench run path/to/cloud_config_file --remove-resources\n```\n\nThis will check the job status every 2 minutes and remove resources after all jobs succeed. If any job fails, resources will be kept.\n\nIf you want to manually remove resources later, use:\n\n```bash\nagbench destroy-stack --config-file `{WORKING_DIR}/{root_dir}/{module}/{benchmark_name}_{timestamp}/aws_configs.yaml`\n```\n\nOr you can remove specific stacks by running:\n\n```bash\nagbench destroy-stack --static-resource-stack STATIC_RESOURCE_STACK_NAME --batch-stack BATCH_STACK_NAME --cdk-deploy-account CDK_DEPLOY_ACCOUNT --cdk-deploy-region CDK_DEPLOY_REGION\n```\n\nwhere you can find all argument values in `{WORKING_DIR}/{root_dir}/{module}/{benchmark_name}_{timestamp}/aws_configs.yaml`.\n\n### Configure the AWS infrastructure\n\nThe default infrastructure configurations are located [here](https://github.com/autogluon/autogluon-bench/blob/master/src/autogluon/bench/cloud/aws/default_config.yaml).\nCDK_DEPLOY_ACCOUNT: dummy\nCDK_DEPLOY_REGION: dummy\nPREFIX: ag-bench\nMAX_MACHINE_NUM: 20\nBLOCK_DEVICE_VOLUME: 100\nTIME_LIMIT: 3600\nRESERVED_MEMORY_SIZE: 15000\nINSTANCE: g4dn.2xlarge\nLAMBDA_FUNCTION_NAME: ag-bench-job\n\nwhere:\n\n- `CDK_DEPLOY_ACCOUNT` and `CDK_DEPLOY_REGION` should be overridden with your AWS account ID and desired region to create the stack.\n- `PREFIX` is used as an identifier for the stack and resources created.\n- `MAX_MACHINE_NUM` is the maximum number of EC2 instances can be started for AWS Batch.\n- `BLOCK_DEVICE_VOLUME` is the size of storage device attached to instance.\n- `TIME_LIMIT` is the timeout of AWS Batch job, i.e. the maximum time the instance will run. There is a buffer of 3600s added on top of it to account for instance startup time and dataset download time.\n- `RESERVED_MEMORY_SIZE` is used together with the instance memory size to calculate the container shm_size.\n- `INSTANCE` is the EC2 instance type.\n- `LAMBDA_FUNCTION_NAME` is the lambda function prefix to submit jobs to AWS Batch.\n\nTo override these configurations, use the `cdk_context` key in your custom config file. See our [sample cloud config](https://github.com/autogluon/autogluon-bench/blob/master/sample_configs/tabular_cloud_configs.yaml) for reference.\n\nFor `multimodal` module, these will also be overridden by a `constraint` defined [here](https://github.com/autogluon/autogluon-bench/tree/master/src/autogluon/bench/resources/multimodal_constraints.yaml) or a custom constraint specified in `multimodal_constraints.yaml` under `custom_resource_dir`. See [sample custom constraints file](https://github.com/autogluon/autogluon-bench/tree/master/sample_configs/resources/multimodal_constraints.yaml)\n\n### Monitoring metrics for your instances on AWS\n\nA variety of metrics are available for the EC2 instances that are launched during benchmarking. These can be accessed through the AWS Console by following this navigation path: `CloudWatch` -\u003e `All metrics` -\u003e `AWS namespaces` -\u003e `EC2`. For a comprehensive list of these metrics, refer to the [official AWS documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html).\n\nIn addition to the standard metrics, we also provide a custom metric for `GPUUtilization`. This can be found in the `CloudWatch` section under `All metrics` -\u003e `Custom namespaces` -\u003e `EC2`. Please note that the `GPUUtilization` metric is also updated every five minutes.\n\nWe provide an option to save aggregated (average) custom hardware metrics (`GPUUtilization` and `CPUUtilization` logged in 5s intervals) to the benchmark directory under the provided S3 bucket, simply use the option when running benchmark:\n\n```\nagbench run --save-hardware-metrics\n```\n\nNote that currently this command waits for all jobs to become successful to pull the hardware metrics.\n\n## Evaluating benchmark runs\n\nBenchmark results can be evaluated using the tools in `src/autogluon/bench/eval/`. The evaluation logic will aggregate, clean, and produce evaluation results for runs stored in S3.\nIn a future release, we intend to add evaluation support for multimodal benchmark results.\n\n### Evaluation Steps\n\nBegin by setting up AWS credentials for the default profile for the AWS account that has the benchmark results in S3.\n\nStep 1: Aggregate AMLB results on S3. After running the benchmark in [AWS mode](#run-benchmarks-on-aws), take note of the `benchmark_name` with timestamp in `{WORKING_DIR}/{root_dir}/{module}/{benchmark_name}_{timestamp}/{module}_cloud_configs.yaml` and run the command below:\n\n```\nagbench aggregate-amlb-results {METRICS_BUCKET} {module} {benchmark_name} --constraint {constraint}\n```\n\nThis will create a new file on S3 with this signature:\n\n```\ns3://{METRICS_BUCKET}/aggregated/{module}/{benchmark_name}/results_automlbenchmark_{constraint}_{benchmark_name}.csv\n```\n\nCurrently, aggregation is also supported for multimodal benchmark results without the `--constratint` option.\n\nFor more details, run:\n\n```\nagbench aggregate-amlb-results --help\n```\n\nStep 2: Further clean the aggregated results.\n\nIf the file is still on S3 from the previous step, run:\n\n```\nagbench clean-amlb-results {benchmark_name} --results-dir-input s3://{METRICS_BUCKET}/aggregated/{module}/{benchmark_name}/ --benchmark-name-in-input-path --results-dir-output {results_dir_output}\n--out-path-prefix {out_path_prefix} --out-path-suffix {out_path_suffix}\n```\n\nwhere `{results_dir_input}` can also be a local directory. This will create a local file `{results_dir_output}/{out_path_prefix}{benchmark_name}{out_path_suffix}`.\n\nFor more details, run:\n\n```\nagbench clean-amlb-results --help\n```\n\nStep 3: Run evaluation on multiple cleaned files from `Step 2`\n\n```\nagbench evaluate-amlb-results --frameworks-run framework_1 --frameworks-run framework_2 --results-dir-input data/results/input/prepared/openml/ --paths file_name_1.csv --paths file_name_2.csv --output-suffix f\"{module}_{preset}_{constraint}_{date}\", --no-clean-data --no-use-tid-as-dataset-name\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fautogluon%2Fautogluon-bench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fautogluon%2Fautogluon-bench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fautogluon%2Fautogluon-bench/lists"}