{"id":30151582,"url":"https://github.com/nrel/sparkctl","last_synced_at":"2026-01-20T16:59:50.153Z","repository":{"id":304843969,"uuid":"1019714674","full_name":"NREL/sparkctl","owner":"NREL","description":"Orchestrates Spark clusters on HPCs","archived":false,"fork":false,"pushed_at":"2025-07-15T17:12:16.000Z","size":277,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-11T11:02:28.124Z","etag":null,"topics":["cluster","hpc","slurm","spark"],"latest_commit_sha":null,"homepage":"https://nrel.github.io/sparkctl/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NREL.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-14T18:53:09.000Z","updated_at":"2025-07-26T02:00:55.000Z","dependencies_parsed_at":"2025-07-16T12:46:04.848Z","dependency_job_id":"75fc8d71-50af-4f5a-818a-837efbef6b92","html_url":"https://github.com/NREL/sparkctl","commit_stats":null,"previous_names":["nrel/sparkctl"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/NREL/sparkctl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NREL%2Fsparkctl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NREL%2Fsparkctl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NREL%2Fsparkctl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NREL%2Fsparkctl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NREL","download_url":"https://codeload.github.com/NREL/sparkctl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NREL%2Fsparkctl/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269873158,"owners_count":24488993,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-11T02:00:10.019Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cluster","hpc","slurm","spark"],"created_at":"2025-08-11T11:02:18.775Z","updated_at":"2026-01-20T16:59:50.147Z","avatar_url":"https://github.com/NREL.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# sparkctl\nThis package implements configuration and orchestration of Spark clusters with standalone cluster\nmanagers. This is useful in environments like HPCs where the infrastructure implemented by cloud\nproviders, such as AWS, is not available. It is particularly helpful when users want to deploy Spark\nbut do not have administrative control of the servers.\n\n## Example usage\nThere are two main ways to use this package:\n\nFirst, allocate compute nodes. For example, with Slurm (1 compute node for the Spark master and\n4 compute nodes for Spark workers):\n   \n```console\n$ salloc -t 01:00:00 -n4 --partition=shared --mem=30G : -N4 --account=\u003cyour-account\u003e --mem=240G\n```\n  \n1. Configure a Spark cluster and run Spark jobs with `spark-submit` or `pyspark`.\n```console\n$ sparkctl configure\n$ sparkctl start\n$ spark-submit --master spark://$(hostname):7077 my-job.py\n$ sparkctl stop\n```\n\n2. Run Spark jobs in a Python script using the `sparkctl` library to manage the cluster.\n```python\nfrom sparkctl import ClusterManager, make_default_spark_config\n\nconfig = make_default_spark_config()\nmgr = ClusterManager(config)\nwith mgr.managed_cluster() as spark:\n    df = spark.createDataFrame([(x, x + 1) for x in range(1000)], [\"a\", \"b\"])\n    df.show()\n```\n\nRefer to the [user documentation](https://nrel.github.io/sparkctl/) for a description of features\nand detailed usage instructions.\n\n## Project Status\nThe package is actively maintained and used at the National Laboratory of the Rockies  (NLR).\nThe software is primarily geared toward HPCs that use Slurm. It also supports a generic list of\nservers as long as the servers have access to a shared filesystem and are accessible via SSH without\npassword login.\n\nIt would be straightforward to extend the functionality to support other HPC resource managers.\nPlease submit an issue or idea or discussion if you have interest in this package but need that\nsupport.\n\nContributions are welcome.\n\n## License\nsparkctl is released under a BSD 3-Clause [license](https://github.com/NREL/sparkctl/blob/main/LICENSE).\n\n## Software Record\nThis package is developed under NLR Software Record SWR-25-109.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnrel%2Fsparkctl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnrel%2Fsparkctl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnrel%2Fsparkctl/lists"}