{"id":13647288,"url":"https://github.com/LLNL/merlin","last_synced_at":"2025-04-22T02:31:22.840Z","repository":{"id":37784833,"uuid":"222809980","full_name":"LLNL/merlin","owner":"LLNL","description":"Machine Learning for HPC Workflows","archived":false,"fork":false,"pushed_at":"2024-10-28T21:33:46.000Z","size":6608,"stargazers_count":122,"open_issues_count":9,"forks_count":26,"subscribers_count":13,"default_branch":"develop","last_synced_at":"2024-10-28T21:35:07.440Z","etag":null,"topics":["big-data","celery-workers","hpc","machine-learning","radiuss","redis-server","simulation","workflow","workflows"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LLNL.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-11-19T23:31:13.000Z","updated_at":"2024-10-28T21:15:04.000Z","dependencies_parsed_at":"2024-01-14T10:03:38.963Z","dependency_job_id":"62217ab9-0f77-454b-80a3-938d8251965d","html_url":"https://github.com/LLNL/merlin","commit_stats":{"total_commits":659,"total_committers":17,"mean_commits":38.76470588235294,"dds":0.6646433990895295,"last_synced_commit":"89093ddb8794580fa76667f4dd0d66e20480b35e"},"previous_names":[],"tags_count":37,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2Fmerlin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2Fmerlin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2Fmerlin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2Fmerlin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LLNL","download_url":"https://codeload.github.com/LLNL/merlin/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223887528,"owners_count":17219940,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","celery-workers","hpc","machine-learning","radiuss","redis-server","simulation","workflow","workflows"],"created_at":"2024-08-02T01:03:28.105Z","updated_at":"2025-04-22T02:31:22.826Z","avatar_url":"https://github.com/LLNL.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"![Python versions](https://img.shields.io/pypi/pyversions/merlin)\n[![License](https://img.shields.io/pypi/l/merlin)](https://pypi.org/project/merlin/)\n![Activity](https://img.shields.io/github/commit-activity/m/LLNL/merlin)\n[![Issues](https://img.shields.io/github/issues/LLNL/merlin)](https://github.com/LLNL/merlin/issues)\n[![Pull requests](https://img.shields.io/github/issues-pr/LLNL/merlin)](https://github.com/LLNL/merlin/pulls)\n[![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/LLNL/merlin.svg?logo=lgtm\u0026logoWidth=18)](https://lgtm.com/projects/g/LLNL/merlin/context:python)\n\n![Merlin](https://raw.githubusercontent.com/LLNL/merlin/main/docs/assets/images/merlin_banner_white.png)\n\n## A brief introduction to Merlin\nMerlin is a tool for running machine learning based workflows. The goal of\nMerlin is to make it easy to build, run, and process the kinds of large\nscale HPC workflows needed for cognitive simulation.\n\nAt its heart, Merlin is a distributed task queuing system, designed to allow complex\nHPC workflows to scale to large numbers of simulations \n(we've done 100 Million on the Sierra Supercomputer).\n\nWhy would you want to run that many simulations?\nTo become your own Big Data generator.\n\nData sets of this size can be large enough to train deep neural networks\nthat can mimic your HPC application, to be used for such\nthings as design optimization, uncertainty quantification and statistical\nexperimental inference. Merlin's been used to study inertial confinement\nfusion, extreme ultraviolet light generation, structural mechanics and\natomic physics, to name a few.\n\nHow does it work?\n\nIn essence, Merlin coordinates complex workflows through a persistent\nexternal queue server that lives outside of your HPC systems, but that\ncan talk to nodes on your cluster(s). As jobs spin up across your ecosystem,\nworkers on those allocations pull work from a central server, which\ncoordinates the task dependencies for your workflow. Since this coordination\nis done via direct connections to the workers (i.e. not through a file\nsystem), your workflow can scale to very large numbers of workers,\nwhich means a very large number of simulations with very little overhead.\n\nFurthermore, since the workers pull their instructions from the central\nserver, you can do a lot of other neat things, like having multiple\nbatch allocations contribute to the same work (think surge computing), or\nspecialize workers to different machines (think CPU workers for your\napplication and GPU workers that train your neural network). Another\nneat feature is that these workers can add more work back to central\nserver, which enables a variety of dynamic workflows, such as may be\nnecessary for the intelligent sampling of design spaces or reinforcement\nlearning tasks.\n\nMerlin does all of this by leveraging some key HPC and cloud computing\ntechnologies, building off open source components. It uses\n[maestro]( https://github.com/LLNL/maestrowf) to\nprovide an interface for describing workflows, as well as for defining\nworkflow task dependencies. It translates those dependencies into concrete\ntasks via [celery](https://docs.celeryproject.org/), \nwhich can be configured for a variety of backend\ntechnologies ([rabbitmq](https://www.rabbitmq.com) and\n[redis](https://redis.io) are currently supported). Although not\na hard dependency, we encourage the use of\n[flux](http://flux-framework.org) for interfacing with\nHPC batch systems, since it can scale to a very large number of jobs.\n\nThe integrated system looks a little something like this:\n\n![A Typical Merlin Workflow](docs/assets/images/merlin_arch.png)\n\nIn this example, here's how it all works:\n\n1. The scientist describes her HPC workflow as a maestro DAG (directed acyclic graph)\n\"spec\" file `workflow.yaml`\n2. She then sends it to the persistent server with  `merlin run workflow.yaml` .\nMerlin translates the file into tasks.\n3. The scientist submits a job request to her HPC center. These jobs ask for workers via\nthe command `merlin run-workers workflow.yaml`.\n4. Coffee break.\n5. As jobs stand up, they pull work from the queue, making calls to flux to get the \nnecessary HPC resources.\n5. Later, workers on a different allocation, with GPU resources connect to the \nserver and contribute to processing the workload.\n\nThe central queue server deals with task dependencies and keeps the workers fed.\n\nFor more details, check out the rest of the [documentation](https://merlin.readthedocs.io/).\n\nNeed help? \u003cmerlin@llnl.gov\u003e\n\n## Quick Start\n\nNote: Merlin supports Python 3.8+.\n\nTo install Merlin and its dependencies, run:\n\n    $ pip3 install merlin\n    \nCreate your application config file:\n\n    $ merlin config\n\nThat's it.\n\nTo run something a little more like what you're interested in,\nnamely a demo workflow that has simulation and machine learning,\nfirst generate an example workflow:\n\n    $ merlin example feature_demo\n\nThen install the workflow's dependencies:\n\n    $ pip install -r feature_demo/requirements.txt\n\nThen process the workflow and create tasks on the server:\n\n    $ merlin run feature_demo/feature_demo.yaml\n\nAnd finally, launch workers that can process those tasks:\n\n    $ merlin run-workers feature_demo/feature_demo.yaml\n\n\n## Documentation\n[**Full documentation**](http://merlin.readthedocs.io/) is available, or\nrun:\n\n    $ merlin --help\n\n(or add `--help` to the end of any sub-command you\nwant to learn more about.)\n\n\n## Code of Conduct\nPlease note that Merlin has a\n[**Code of Conduct**](.github/CODE_OF_CONDUCT.md). By participating in\nthe Merlin community, you agree to abide by its rules.\n\n\n## License\nMerlin is distributed under the terms of the [MIT LICENSE](https://github.com/LLNL/merlin/blob/main/LICENSE).\n\nLLNL-CODE-797170\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLLNL%2Fmerlin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FLLNL%2Fmerlin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLLNL%2Fmerlin/lists"}