{"id":19706041,"url":"https://github.com/llnl/magpie","last_synced_at":"2025-04-12T18:46:46.793Z","repository":{"id":37336398,"uuid":"13310592","full_name":"LLNL/magpie","owner":"LLNL","description":"Magpie contains a number of scripts for running Big Data software in HPC environments, including Hadoop and Spark. There is support for Lustre, Slurm, Moab, Torque. LSF, Flux, and more.","archived":false,"fork":false,"pushed_at":"2025-02-13T17:27:48.000Z","size":10844,"stargazers_count":198,"open_issues_count":38,"forks_count":52,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-04-12T05:03:28.228Z","etag":null,"topics":["hpc","shell","workflows"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LLNL.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS","contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-10-03T21:41:36.000Z","updated_at":"2025-03-04T04:36:59.000Z","dependencies_parsed_at":"2022-07-21T00:14:37.449Z","dependency_job_id":"a2c08ecc-d0fa-456d-a5d0-cae2a7cbfa57","html_url":"https://github.com/LLNL/magpie","commit_stats":null,"previous_names":[],"tags_count":97,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2Fmagpie","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2Fmagpie/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2Fmagpie/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2Fmagpie/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LLNL","download_url":"https://codeload.github.com/LLNL/magpie/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248617650,"owners_count":21134196,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hpc","shell","workflows"],"created_at":"2024-11-11T21:33:19.855Z","updated_at":"2025-04-12T18:46:46.767Z","avatar_url":"https://github.com/LLNL.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"Magpie\n------\n\nMagpie contains a number of scripts for running Big Data software in\nHPC environments.  Thus far, Hadoop, Spark, Hbase, Storm, Pig,\nPhoenix, Kafka, Zeppelin, Zookeeper, and Alluxio are supported. It\ncurrently supports running over the parallel file system Lustre and\nrunning over any generic network filesytem.  There is\nscheduler/resource manager support for Slurm, Moab, Torque, LSF,\nand Flux.\n\nSome of the features presently supported:\n\n- Run jobs interactively or via scripts.\n- Run against a number of filesystem options, such as HDFS, HDFS over\n  Lustre, HDFS over a generic network filesystem, Lustre directly, or\n  a generic network filesystem.\n- Take advantage of SSDs/NVRAM for local caching if available\n- Make decent optimizations for your hardware\n\nExperimental support for several distributed machine learning\nframeworks has also been added.  Presently tensorflow, tensorflow\nw/ horovod, and ray is supported.\n\nBasic Idea\n----------\n\nThe basic idea behind these scripts are to:\n\n1) Submit a Magpie batch script to allocate nodes on a cluster using\n   your HPC scheduler/resource manager.  Slurm, Slurm+mpirun,\n   Moab+Slurm, Moab+Torque, LSF+mpirun, and Flux are currently supported.\n\n2) The batch script will create configuration files for all\n   appropriate projects (Hadoop, Spark, etc.)  The configuration files\n   will be setup so the rank 0 node is the \"master\".  All compute\n   nodes will have configuration files created that point to the node\n   designated as the master server.\n\n   The configuration files will be populated with values for your\n   filesystem choice and the hardware that exists in your cluster.\n   Reasonable attempts are made to determine optimal values for your\n   system and hardware (they are almost certainly better than the\n   default values).  A number of options exist in the batch scripts to\n   adjust these values for individual jobs.\n\n3) Launch daemons on all nodes.  The rank 0 node will run master\n   daemons, such as the Hadoop Namenode.  All remaining nodes will run\n   appropriate worker daemons, such as the Hadoop Datanodes.\n\n4) Now you have a mini big data cluster to do whatever you want.  You\n   can log into the master node and interact with your mini big data\n   cluster however you want.  Or you could have Magpie run a script to\n   execute your big data calculation instead.\n\n5) When your job completes or your allocation time has run out, Magpie\n   will cleanup your job by tearing down daemons.  When appropriate,\n   Magpie may also do some additional cleanup work to hopefully make\n   re-execution on later runs cleaner and faster.\n\nSupported Packages \u0026 Versions\n-----------------------------\n\nFor a complete list of supported package versions and dependencies,\nplease see ```doc/README```.  The following can be considered a\nsummary of support.\n\nHadoop - 2.2.0, 2.3.0, 2.4.X, 2.5.X, 2.6.X, 2.7.X, 2.8.X, 2.9.X,\n         3.0.X, 3.1.X, 3.2.X, 3.3.X, 3.4.X\n\nSpark - 1.1.X, 1.2.X, 1.3.X, 1.4.X, 1.5.X, 1.6.X, 2.0.X, 2.1.X, 2.2.X,\n        2.3.X, 2.4.X, 3.0.X, 3.1.X, 3.2.X, 3.3.X, 3.4.X, 3.5.X\n\nHbase - 1.0.X, 1.1.X, 1.2.X, 1.3.X, 1.4.X, 1.5.X, 1.6.X\n\nHive - 2.3.0\n\nPig - 0.13.0, 0.14.0, 0.15.0, 0.16.0, 0.17.0\n\nZookeeper - 3.4.X\n\nStorm - 0.9.X, 0.10.X, 1.0.X, 1.1.X, 1.2.X\n\nPhoenix - 4.5.X, 4.6.0, 4.7.0, 4.8.X, 4.9.0, 4.10.1, 4.11.0, 4.12.0,\n          4.13.X, 4.14.0\n\nKafka - 2.11-0.9.0.0\n\nZeppelin - 0.6.X, 0.7.X, 0.8.X\n\nAlluxio - 2.3.0\n\nTensorFlow - 1.9, 1.12\n\nRay - 0.7.0\n\nOlder Supported Packages \u0026 Features\n-----------------------------------\n\nSome packages and features were dropped due to lack of interest, the\nsoftware becoming old/deprecated, and/or their initial experimental\naddition into Magpie.  If you are interested in them, please look at\nolder versions for supported versions and documentation.  If you are\nvery interested in support in current versions of Magpie beyond an\nexperimental nature, please submit a support request and we can\nreconsider adding it back in.\n\nRemoved in Magpie 2.0\n\n   - Hadoop 1.X support\n   - Tachyon\n   - UDA/uda-plugin for Hadoop\n   - HDFS Federation in Hadoop\n   - IntelLustre option for a Hadoop Filesystem\n   - MagpieNetworkFS option for a Hadoop Filesystem\n\nRemoved in Magpie 3.0\n\n   - Spark 0.9.X support\n   - Hbase 0.98.X and 0.99.X support\n   - Mahout\n\nDocumentation\n-------------\n\nAll documentation is in the 'doc' subdirectory.  Please see the\ndoc/README file as a starting point.  It provides general instructions\nas well as pointers to documentation for each project, setup\nrequirements, ability to do local configurations, tips \u0026 tricks, and\nmore information.\n\nRelease\n-------\n\nMagpie is release under a GPL license. For more information, see the [COPYING](/COPYING) file.\n\n`LLNL-CODE-644248`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fllnl%2Fmagpie","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fllnl%2Fmagpie","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fllnl%2Fmagpie/lists"}