{"id":34756570,"url":"https://github.com/filipspl/flaml-log-analyze","last_synced_at":"2026-04-19T21:03:09.212Z","repository":{"id":329421905,"uuid":"1119387588","full_name":"filipsPL/flaml-log-analyze","owner":"filipsPL","description":"Program for analysing and interpreting FLAML optimisation logs.","archived":false,"fork":false,"pushed_at":"2025-12-19T15:01:21.000Z","size":2584,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-12-22T04:48:40.636Z","etag":null,"topics":["automl","automl-python","flaml","machine-learning","optimisation","optimization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/filipsPL.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-19T07:36:55.000Z","updated_at":"2025-12-19T15:01:25.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/filipsPL/flaml-log-analyze","commit_stats":null,"previous_names":["filipspl/flaml-log-analyze"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/filipsPL/flaml-log-analyze","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/filipsPL%2Fflaml-log-analyze","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/filipsPL%2Fflaml-log-analyze/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/filipsPL%2Fflaml-log-analyze/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/filipsPL%2Fflaml-log-analyze/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/filipsPL","download_url":"https://codeload.github.com/filipsPL/flaml-log-analyze/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/filipsPL%2Fflaml-log-analyze/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32022561,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T20:23:30.271Z","status":"online","status_checked_at":"2026-04-19T02:00:07.110Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automl","automl-python","flaml","machine-learning","optimisation","optimization"],"created_at":"2025-12-25T05:44:17.067Z","updated_at":"2026-04-19T21:03:09.197Z","avatar_url":"https://github.com/filipsPL.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# flaml-analyze\n\nExtract and analyze best configurations from [FLAML AutoML](https://github.com/microsoft/FLAML) optimization logs.\n\n## Overview\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17987939.svg)](https://doi.org/10.5281/zenodo.17987939) [![Testing](https://github.com/filipsPL/flaml-log-analyze/actions/workflows/test.yml/badge.svg)](https://github.com/filipsPL/flaml-log-analyze/actions/workflows/test.yml)\n\nThis tool processes FLAML optimization logs to:\n- Extract best configurations for each learner (algorithm)\n- Generate two types of warm start configurations:\n  - **Absolute**: Pure top-N by performance\n  - **Representative**: Diverse best-per-cluster selections\n- Visualize the search space exploration\n- Provide detailed optimization statistics\n\n## Sample outputs\n\n\n#### Optimization summary\n\n![sample analysis](sample_output/optimization_analysis.png)\n\n#### Search space visualization\n\n![sample exploration](sample_output/search_space_2d.png)\n\n\n#### Performance report\n\n```\n================================================================================\nFLAML OPTIMIZATION SUMMARY\n================================================================================\n\nOVERALL STATISTICS\n--------------------------------------------------------------------------------\nTotal trials: 12777\nBest validation loss: 0.118131\nWorst validation loss: 0.668284\nMean validation loss: 0.338847\nStd validation loss: 0.075695\nTotal wall clock time: 21599.62 seconds (359.99 minutes)\nMean trial time: 1.69 seconds\n\nLEARNER STATISTICS\n--------------------------------------------------------------------------------\nLearner         Trials     Best Loss       Mean Loss\n--------------------------------------------------------------------------------\ncatboost        576        0.315344        0.387085\nextra_tree      909        0.233602        0.425164\nlgbm            477        0.248275        0.419497\nrf              1740       0.180855        0.348641\nxgb_limitdepth  321        0.245129        0.411951\nxgboost         8754       0.118131        0.317687\n...\n```\n\n#### Warm-start parameters for the next FLAML round\n\n```python\nwarm_start_configs = {\n    # Top 5 configurations for catboost\n    'catboost': [\n        {\"early_stopping_rounds\":10,\"learning_rate\":0.14189952377559728,\"n_estimators\":8192,\"FLAML_sample_size\":49659},  # Rank 1: metric=0.315344\n        {\"early_stopping_rounds\":11,\"learning_rate\":0.1530902242854414,\"n_estimators\":8192,\"FLAML_sample_size\":10000},  # Rank 2: metric=0.316097\n        {\"early_stopping_rounds\":12,\"learning_rate\":0.09541333025917802,\"n_estimators\":8192,\"FLAML_sample_size\":49659},  # Rank 3: metric=0.320287\n        {\"early_stopping_rounds\":10,\"learning_rate\":0.09544104526717777,\"n_estimators\":8192,\"FLAML_sample_size\":49659},  # Rank 4: metric=0.320287\n        {\"early_stopping_rounds\":10,\"learning_rate\":0.09541180730499482,\"n_estimators\":8192,\"FLAML_sample_size\":49659},  # Rank 5: metric=0.320287\n    ]\n}\n```\n\n## Requirements\n\n### Python Version\n- Python 3.7+\n\n### Dependencies\n```bash\npip install numpy scikit-learn matplotlib\n```\n\nAll dependencies are standard packages, no special installations needed.\n\n## Quick Start\n\n### Basic Usage\n```bash\n./flaml-analyze.py path/to/optimization.log\n```\n\nThis will:\n1. Parse the FLAML log file\n2. Extract top 5 configs per learner (absolute + representative)\n3. Generate visualizations and summaries\n4. Save warm start configurations\n\n### Common Options\n```bash\n# Extract top 10 configs per learner\n./flaml-analyze.py optimization.log --warm-start-per-method 10\n\n# Adjust performance filtering (keep top 30% before clustering)\n./flaml-analyze.py optimization.log --performance-percentile 30\n\n# Save to specific directory\n./flaml-analyze.py optimization.log -o results/\n\n# Extract top 3 for analysis summary\n./flaml-analyze.py optimization.log -n 3\n```\n\n## Output Files\n\nThe script generates:\n\n1. **`warm_start_configs_absolute.py`**\n   - Pure top-N configurations ranked by performance\n   - Use for maximum performance and fast convergence\n\n2. **`warm_start_configs_representative.py`**\n   - Diverse configurations (K-Means + best per cluster)\n   - Use for exploration and robustness\n\n3. **`search_space_2d.png`**\n   - PCA 2D projection showing search space exploration\n   - Visualizes where FLAML searched and selected configs\n\n4. **`optimization_analysis.png`**\n   - Temporal progress plots\n   - Shows optimization convergence over time\n\n5. **`optimization_summary.txt`**\n   - Detailed statistics and best configurations\n   - Text report of the optimization run\n\n## Command-Line Arguments\n\n```\npositional arguments:\n  log_file              Path to FLAML log file (JSON lines format)\n\noptions:\n  -n N_BEST             Number of best configs for analysis summary (default: 1)\n  -o OUTPUT_DIR         Output directory (default: same as log file)\n  --warm-start-per-method N\n                        Number of configs per method for warm start (default: 5)\n  --warm-start-overall N\n                        Number of best overall configs (default: 5, currently unused)\n  --performance-percentile X\n                        Keep top X% before clustering (default: 20.0)\n```\n\n## Configuration Selection Strategies\n\n### Absolute Top-N\n- **Method**: Simply select the N best configurations by validation loss\n- **Pros**: Guaranteed best performance, fast convergence\n- **Cons**: Configurations may be very similar (redundant)\n- **Use when**: Short optimization time, known good region\n\n### Representative (K-Means + Best per Cluster)\n- **Method**: \n  1. Filter to top 20% by performance\n  2. Cluster into K groups using K-Means\n  3. Select best performer from each cluster\n- **Pros**: Diverse exploration, best-in-region configs, robust\n- **Cons**: Slightly lower initial performance than absolute\n- **Use when**: Medium/long optimization time, exploration needed\n\n## Using Warm Start Configs\n\n### Load and Use in FLAML\n```python\n# Load the configurations\nexec(open('warm_start_configs_representative.py').read())\n\n# Extract configs for a specific learner\ncatboost_configs = warm_start_configs['catboost']\n\n# Use in FLAML (format depends on FLAML version)\n# Option 1: Direct warm start\nautoml = AutoML()\nautoml.fit(\n    X_train, y_train,\n    task='classification',\n    starting_points={'catboost': [c for c in catboost_configs]},\n    time_budget=3600\n)\n\n# Option 2: As initial points\npoints_to_evaluate = [\n    (config, learner) \n    for learner, configs in warm_start_configs.items() \n    for config in configs\n]\n```\n\n## Examples\n\n### Example 1: Basic extraction\n```bash\n./flaml-analyze.py logs/dataset1__descriptors_flaml.log\n```\nOutput: 5 absolute + 5 representative configs per learner\n\n### Example 2: More diverse configs\n```bash\n./flaml-analyze.py logs/optimization.log \\\n    --warm-start-per-method 10 \\\n    --performance-percentile 30\n```\nOutput: 10 configs per learner, selected from top 30%\n\n### Example 3: Aggressive filtering\n```bash\n./flaml-analyze.py logs/optimization.log \\\n    --warm-start-per-method 5 \\\n    --performance-percentile 10\n```\nOutput: 5 configs per learner, selected from top 10% (best quality)\n\n### Example 4: Analysis with custom output\n```bash\n./flaml-analyze.py logs/optimization.log \\\n    -n 10 \\\n    -o analysis_results/\n```\nOutput: Analysis shows top 10, warm start saves 5 (default), all in analysis_results/\n\n## Tips\n\n### Choosing the Right Settings\n\n**For short FLAML runs (\u003c100 trials per learner):**\n```bash\n--warm-start-per-method 3 --performance-percentile 30\n```\nKeep more configs, be less aggressive\n\n**For long FLAML runs (\u003e500 trials per learner):**\n```bash\n--warm-start-per-method 5 --performance-percentile 10\n```\nCan be more selective, quality is high\n\n**For exploration:**\n```bash\n--warm-start-per-method 10 --performance-percentile 20\n```\nMore diverse starting points\n\n**For exploitation:**\n```bash\n--warm-start-per-method 3 --performance-percentile 5\n```\nFocus on best-known region\n\n\n## Advanced Usage\n\n### Disable warm start generation\n```bash\n./flaml-analyze.py log.txt \\\n    --warm-start-overall 0 \\\n    --warm-start-per-method 0\n```\n\n### Analyze only (minimal configs)\n```bash\n./flaml-analyze.py log.txt -n 10 \\\n    --warm-start-per-method 1\n```\n\n## Algorithm Details\n\n### Representative Selection\n1. **Performance filtering**: Keep top X% (default: 20%)\n2. **K-Means clustering**: Group into K clusters\n3. **Best per cluster**: Select champion from each cluster\n4. **Result**: K diverse, high-performing configurations\n\n### Known issues\n\n- Encoding of Categorical Variables before PCA and clustering using LabelEncoder. The Issue: LabelEncoder assigns arbitrary integers (0, 1, 2). K-Means treats these as continuous distances. The PCA projections and clusters for categorical hyperparameters may be slightly distorted.\n\n\n## Version History\n\n- **v0.1.2**: K-Means + best per cluster approach\n- **v0.1.0**: Added PCA 2D visualization\n- **v0.0.1**: Initial release with absolute/representative selection\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffilipspl%2Fflaml-log-analyze","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffilipspl%2Fflaml-log-analyze","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffilipspl%2Fflaml-log-analyze/lists"}