{"id":42573488,"url":"https://github.com/martinlschumann/safeguard-system","last_synced_at":"2026-01-28T21:30:30.394Z","repository":{"id":271616850,"uuid":"914019259","full_name":"martinlschumann/safeguard-system","owner":"martinlschumann","description":null,"archived":false,"fork":false,"pushed_at":"2025-01-21T17:23:17.000Z","size":35107,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-21T18:30:35.086Z","etag":null,"topics":["autogluon","automl","automl-python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/martinlschumann.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-08T19:43:34.000Z","updated_at":"2025-01-21T17:23:21.000Z","dependencies_parsed_at":null,"dependency_job_id":"ec3b0bea-7fd0-4410-a8e6-cd15a042f1a3","html_url":"https://github.com/martinlschumann/safeguard-system","commit_stats":null,"previous_names":["martinlschumann/safeguard-system"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/martinlschumann/safeguard-system","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/martinlschumann%2Fsafeguard-system","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/martinlschumann%2Fsafeguard-system/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/martinlschumann%2Fsafeguard-system/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/martinlschumann%2Fsafeguard-system/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/martinlschumann","download_url":"https://codeload.github.com/martinlschumann/safeguard-system/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/martinlschumann%2Fsafeguard-system/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28852708,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-28T15:15:36.453Z","status":"ssl_error","status_checked_at":"2026-01-28T15:15:13.020Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autogluon","automl","automl-python"],"created_at":"2026-01-28T21:30:29.582Z","updated_at":"2026-01-28T21:30:30.387Z","avatar_url":"https://github.com/martinlschumann.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Description of Code Files\n\nFiles with `_parallel` are the same as the normal, but use `ProcessPoolExecutor` to run the code in parallel.\n\n- autogluon_standalone_linear_ensemble_read_out: It reads data from a slurm output file and only change the is_plus item using the plus_or_minus functions. Prints the output of this as a json file. This allows for much quicker full runs, as only the plus_or_minus funcs need to be run.\n- autogluon_semi-supervised: Pipeline utilizing Autogluon's built-in support for Semi-Supervised Learning.\n- autogluon_standalone_calc_pymfe: Trains an Autogluon based Safeguard System Predictor.\n- autogluon_standalone_load_model_pymfe: Load a TabularPredictor to run it for testing\n- autogluon_standalone_pymfe_full_study_*: Run through the full dataset(s) mentioned with the full linear-ensembling pipeline. Uses Autogluon based Safeguard System.\n- custom_datasets: the custom datasets as curated by Fusi et al. in \"Probabilistic Matrix Factorization for Automated Machine Learning\"\n- pyme-runner: Creates a CSV file that can be used to train a safeguard system predictor from a given slurm_file.\n- read_out_compare: Counts which run performed better given 2 slurm_output runs by counting which run won for each dataset.\n- read_out_txt: Reads out the output from linear-ensembling runs and converts it into a defined datastructure (OpenMLDatasetResult).\n- safeguard_perf: Calculate the mean accuracy of runs with and without the safeguard system. Also scatter plot the runs with and without the safeguard system.\n- sklearn_linear_ensemble_read_out: It reads data from a slurm output file and only change the is_plus item using the plus_or_minus functions. Prints the output of this as a json file. This allows for much quicker full runs, as only the plus_or_minus funcs need to be run.\n- sklearn_tree_calc_pymfe: Trains an Sklearn based RandomForestClassifier Safeguard System Predictor.\n- sklearn_tree_debug_pymfe: Graph and explain the output from the RandomForestClassifier. Also is able to tune the hyperparameters of the RandomForestClassifier.\n- sklearn_tree_pymfe_full_study: Run through the full dataset(s) selected with the full linear-ensembling pipeline. Uses Sklearn based Safeguard System.\n- test_linear_ensemble: A test file for testing the linear ensembling process.\n\n# Running meta feature analysis (v2, using pymfe and autogluon as the predictor)\n\n1. Run `pymfe-runner.py -i \u003cslurm_output file\u003e -o \u003ccsv_file\u003e`. Decide which metric for meta-features (`MFE`) should be used, e.g. landmarking, model-based or a combination.\n2. Run `autogluon_standalone_calc_pymfe.py -f \u003ccsv file\u003e` to create the Autogluon model, which will be saved in \"AutogluonModels\". To change the eval model used, it can be imported and changed when calling the `main` function. The default is `recall`.\n3. Use this AutoGluon model in e.g., `autogluon_standalone_pymfe_full_study_100.py`. Edit `slurm_full_study_pymfe-diff-cluster.sh` to include the file name of the model output in step 2. Don't forget to edit MFE to use the same metrics as in `pymfe-runner.py`.\n\n## Examples\n`./autogluon_standalone_calc_pymfe.py -f csv_files/fullpymfe-model-based-cc18+100.csv`\n\nUsage:\n`autogluon_standalone_pymfe_full_study.py -f \u003csaved model from autogluon\u003e`.\n\n# Running meta feature analysis (v2, using pymfe and sklearn as the predictor)\n\n1. Run `pymfe-runner.py -i \u003cslurm_output file\u003e -o \u003ccsv_file\u003e`. Decide which metric for meta-features (`MFE`) should be used, e.g. landmarking, model-based or a combination.\n2. Run `sklearn_tree_calc_pymfe.py -f \u003ccsv file\u003e -o \u003coutput file\u003e.joblib` to create the sklearn model, which will be saved in \"SavedSklearnModels\".\n3. Use this sklearn model in `sklearn_tree_pymfe_full_study.py`. Edit `slurm_full_study_sklearn_pymfe-diff-cluster.sh` to include the file name of the model output in step 2. Don't forget to edit MFE to use the same metrics as in `pymfe-runner.py`.\n\n## Examples\n`./sklearn_tree_calc_pymfe.py -f csv_files/fullpymfe-model-based-cc18+100.csv`\n\nUsage:\n`sklearn_tree_pymfe_full_study.py -f \u003csaved model from sklearn\u003e.joblib` or `slurm_full_study_sklearn_pymfe-diff-cluster.sh` for a slurm version\n\n# Faster Version of v2, Step 3\n\nIf a slurm output file of the selected dataset and an AutoGluon/sklearn model already exist, and only the `_full_study` etc.. needs to be done to calculate what the model will predict (\"plus\" or \"minus\"), then the `co_ensemble_read_out.py` (sklearn) and `autogluon_co_ensemble_read_out.py` (autogluon) can be used. The syntax is as follows: `python3.10 co_ensemble_read_out.py -f \u003csklearn model\u003e -s \u003cslurm_output\u003e -o \u003coutput file.json\u003e` for sklearn, similar for autogluon. Instead of a regular formatted slurm_output file, this code returns a json version, as described in \"Output Format.md\".\n\n# Even Faster Version of Step 3\n\n`co_ensemble_read_out.py` and `autogluon_co_ensemble_read_out.py` also have a parallel versions, which run the same code but in parallel with the help of `ProcessPoolExecutor`, which means that the GIL is not a [problem](https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmartinlschumann%2Fsafeguard-system","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmartinlschumann%2Fsafeguard-system","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmartinlschumann%2Fsafeguard-system/lists"}