{"id":13640777,"url":"https://github.com/logicalclocks/maggy","last_synced_at":"2025-04-13T06:41:40.730Z","repository":{"id":34761744,"uuid":"171302877","full_name":"logicalclocks/maggy","owner":"logicalclocks","description":"Distribution transparent Machine Learning experiments on Apache Spark","archived":false,"fork":false,"pushed_at":"2024-02-21T15:57:45.000Z","size":6018,"stargazers_count":90,"open_issues_count":8,"forks_count":14,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-03-26T23:08:31.783Z","etag":null,"topics":["ablation","ablation-studies","ablation-study","automl","blackbox-optimization","hyperparameter-optimization","hyperparameter-search","hyperparameter-tuning","spark"],"latest_commit_sha":null,"homepage":"https://maggy.ai","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/logicalclocks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-02-18T14:53:13.000Z","updated_at":"2024-10-03T10:13:19.000Z","dependencies_parsed_at":"2022-09-11T07:00:26.485Z","dependency_job_id":"283b89b1-864f-4e9a-8242-3688cdfa1fe6","html_url":"https://github.com/logicalclocks/maggy","commit_stats":{"total_commits":92,"total_committers":11,"mean_commits":8.363636363636363,"dds":0.4565217391304348,"last_synced_commit":"909e98edae1777911a9b3ac616977b0c216be9e7"},"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/logicalclocks%2Fmaggy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/logicalclocks%2Fmaggy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/logicalclocks%2Fmaggy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/logicalclocks%2Fmaggy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/logicalclocks","download_url":"https://codeload.github.com/logicalclocks/maggy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248675434,"owners_count":21143763,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ablation","ablation-studies","ablation-study","automl","blackbox-optimization","hyperparameter-optimization","hyperparameter-search","hyperparameter-tuning","spark"],"created_at":"2024-08-02T01:01:14.348Z","updated_at":"2025-04-13T06:41:40.705Z","avatar_url":"https://github.com/logicalclocks.png","language":"Python","funding_links":[],"categories":["Neural Architecture Search","AutoML","Libraries"],"sub_categories":["Distributed Frameworks"],"readme":"\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/logicalclocks/maggy\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/moritzmeister/maggy/mkdocs/docs/assets/images/maggy.png\" width=\"320\" alt=\"Maggy\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://community.hopsworks.ai\"\u003e\u003cimg\n    src=\"https://img.shields.io/discourse/users?label=Hopsworks%20Community\u0026server=https%3A%2F%2Fcommunity.hopsworks.ai\"\n    alt=\"Hopsworks Community\"\n  /\u003e\u003c/a\u003e\n    \u003ca href=\"https://maggy.ai\"\u003e\u003cimg\n    src=\"https://img.shields.io/badge/docs-MAGGY-orange\"\n    alt=\"Maggy Documentation\"\n  /\u003e\u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/maggy/\"\u003e\u003cimg\n    src=\"https://img.shields.io/pypi/v/maggy?color=blue\"\n    alt=\"PyPiStatus\"\n  /\u003e\u003c/a\u003e\n  \u003ca href=\"https://pepy.tech/project/maggy/month\"\u003e\u003cimg\n    src=\"https://pepy.tech/badge/maggy/month\"\n    alt=\"Downloads\"\n  /\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/psf/black\"\u003e\u003cimg\n    src=\"https://img.shields.io/badge/code%20style-black-000000.svg\"\n    alt=\"CodeStyle\"\n  /\u003e\u003c/a\u003e\n  \u003ca\u003e\u003cimg\n    src=\"https://img.shields.io/pypi/l/maggy?color=green\"\n    alt=\"License\"\n  /\u003e\u003c/a\u003e\n\u003c/p\u003e\n\nMaggy is a framework for **distribution transparent** machine learning experiments on [Apache Spark](https://spark.apache.org/).\nIn this post, we introduce a new unified framework for writing core ML training logic as **oblivious training functions**.\nMaggy enables you to reuse the same training code whether training small models on your laptop or reusing the same code to scale out hyperparameter tuning or distributed deep learning on a cluster.\nMaggy enables the replacement of the current waterfall development process for distributed ML applications, where code is rewritten at every stage to account for the different distribution context.\n\n\u003cp align=\"center\"\u003e\n  \u003cfigure\u003e\n    \u003ca href=\"https://github.com/logicalclocks/maggy\"\u003e\n      \u003cimg src=\"https://raw.githubusercontent.com/moritzmeister/maggy/mkdocs/docs/assets/images/firstgraph.png\" alt=\"Maggy\"\u003e\n    \u003c/a\u003e\n    \u003cfigcaption\u003eMaggy uses the same distribution transparent training function in all steps of the machine learning development process.\u003c/figcaption\u003e\n  \u003c/figure\u003e\n\u003c/p\u003e\n\n## Quick Start\n\nMaggy uses PySpark as an engine to distribute the training processes. To get started, install Maggy in the Python environment used by your Spark Cluster, or install Maggy in your local Python environment with the `'spark'` extra, to run on Spark in local mode:\n\n```python\npip install maggy\n```\n\nThe programming model consists of wrapping the code containing the model training\ninside a function. Inside that wrapper function provide all imports and\nparts that make up your experiment.\n\nSingle run experiment:\n\n```python\ndef train_fn():\n    # This is your training iteration loop\n    for i in range(number_iterations):\n        ...\n        # add the maggy reporter to report the metric to be optimized\n        reporter.broadcast(metric=accuracy)\n         ...\n    # Return metric to be optimized or any metric to be logged\n    return accuracy\n\nfrom maggy import experiment\nresult = experiment.lagom(train_fn=train_fn, name='MNIST')\n```\n\n**lagom** is a Swedish word meaning \"just the right amount\". This is how MAggy\nuses your resources.\n\n\n## Documentation\n\nFull documentation is available at [maggy.ai](https://maggy.ai/)\n\n## Contributing\n\nThere are various ways to contribute, and any contribution is welcome, please follow the\nCONTRIBUTING guide to get started.\n\n## Issues\n\nIssues can be reported on the official [GitHub repo](https://github.com/logicalclocks/maggy/issues) of Maggy.\n\n## Citation\n\nPlease see our publications on [maggy.ai](https://maggy.ai/publications) to find out how to cite our work.\n\n## Acknowledgements\n\nThe development of Maggy is supported by the \u003ca href=\"https://deepcube-h2020.eu/\"\u003eEU H2020 Deep Cube Project\u003c/a\u003e (Grant agreement ID: 101004188).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flogicalclocks%2Fmaggy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flogicalclocks%2Fmaggy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flogicalclocks%2Fmaggy/lists"}