{"id":15191557,"url":"https://github.com/spotify/hype","last_synced_at":"2025-10-02T06:32:23.884Z","repository":{"id":57727192,"uuid":"84466727","full_name":"spotify/hype","owner":"spotify","description":"Runs JVM closures in Docker containers on Kubernetes","archived":true,"fork":false,"pushed_at":"2018-03-23T17:46:09.000Z","size":537,"stargazers_count":36,"open_issues_count":0,"forks_count":3,"subscribers_count":89,"default_branch":"master","last_synced_at":"2024-09-28T21:01:26.244Z","etag":null,"topics":["cpu","disk","docker","kubernetes","lambda","memory","workflow"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/spotify.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-03-09T16:52:29.000Z","updated_at":"2024-04-02T10:47:35.000Z","dependencies_parsed_at":"2022-09-26T21:51:15.092Z","dependency_job_id":null,"html_url":"https://github.com/spotify/hype","commit_stats":null,"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spotify%2Fhype","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spotify%2Fhype/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spotify%2Fhype/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spotify%2Fhype/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/spotify","download_url":"https://codeload.github.com/spotify/hype/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":234951866,"owners_count":18912481,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpu","disk","docker","kubernetes","lambda","memory","workflow"],"created_at":"2024-09-27T21:01:17.242Z","updated_at":"2025-10-02T06:32:18.564Z","avatar_url":"https://github.com/spotify.png","language":"Java","funding_links":[],"categories":["Java","Serverless"],"sub_categories":["微服务框架"],"readme":"[DEPRECATED] hype\n====\nPLEASE NOTE: THIS REPO HAS BEEN DEPRECATED BECAUSE IT IS NO LONGER USED BY ANY PROJECTS AT SPOTIFY AND THERE ARE NO PLANS TO CONTINUE DEVELOPMENT.\n\nIT WILL NOW BE ARCHIVED.\n\n                       .ed\"\"\"\" \"\"\"$$$$be.\n                     -\"           ^\"\"**$$$e.\n                   .\"                   '$$$c\n                  /                      \"4$$b\n                 d  3                     $$$$\n                 $  *                   .$$$$$$\n                .$  ^c           $$$$$e$$$$$$$$.\n                d$L  4.         4$$$$$$$$$$$$$$b\n                $$$$b ^ceeeee.  4$$ECL.F*$$$$$$$\n    e$\"\"=.      $$$$P d$$$$F $ $$$$$$$$$- $$$$$$\n    z$$b. ^c     3$$$F \"$$$$b   $\"$$$$$$$  $$$$*\"      .=\"\"$c\n    4$$$$L   \\     $$P\"  \"$$b   .$ $$$$$...e$$        .=  e$$$.\n    ^*$$$$$c  %..   *c    ..    $$ 3$$$$$$$$$$eF     zP  d$$$$$\n    \"**$$$ec   \"\\   %ce\"\"    $$$  $$$$$$$$$$*    .r\" =$$$$P\"\"\n          \"*$b.  \"c  *$e.    *** d$$$$$\"L$$    .d\"  e$$***\"\n            ^*$$c ^$c $$$      4J$$$$$% $$$ .e*\".eeP\"\n               \"$$$$$$\"'$=e....$*$$**$cz$$\" \"..d$*\"\n                 \"*$$$  *=%4.$ L L$ P3$$$F $$$P\"\n                    \"$   \"%*ebJLzb$e$$$$$b $P\"\n                      %..      4$$$$$$$$$$ \"\n                       $$$e   z$$$$$$$$$$%\n                        \"*$c  \"$$$$$$$P\"\n                         .\"\"\"*$$$$$$$$bc\n                      .-\"    .$***$$$\"\"\"*e.\n                   .-\"    .e$\"     \"*$c  ^*b.\n            .=*\"\"\"\"    .e$*\"          \"*bc  \"*$e..\n          .$\"        .z*\"               ^*$e.   \"*****e.\n          $$ee$c   .d\"                     \"*$.        3.\n          ^*$E\")$..$\"                         *   .ee==d%\n             $.d$$$*                           *  J$$$e*\n              \"\"\"\"\"                             \"$$$\"\n\n\n\nPREVIOUS DOCUMENTATION FOLLOWS\n\n[![Build Status](https://img.shields.io/circleci/project/github/spotify/hype/master.svg)](https://circleci.com/gh/spotify/hype)\n[![codecov.io](https://codecov.io/github/spotify/hype/coverage.svg?branch=master)](https://codecov.io/github/spotify/hype?branch=master)\n[![Maven Central](https://img.shields.io/maven-central/v/com.spotify/hype-root.svg)](https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22com.spotify%22%20hype*)\n[![GitHub license](https://img.shields.io/github/license/spotify/hype.svg)](./LICENSE)\n\nA library for seamlessly executing arbitrary JVM closures in [Docker] containers on [Kubernetes].\n\n---\n\n- [User guide](#user-guide)\n  * [Dependency](#dependecy)\n  * [Run functions](#run-functions)\n  * [Full example](#full-example)\n  * [Leveraging implicits](#leveraging-implicits)\n  * [Custom environment images](#custom-environment-images)\n- [Process overview](#process-overview)\n- [Persistent disk](#persistent-disk)\n  * [GCE Persistent Disk](#gce-persistent-disk)\n    + [Volume re-use](#volume-re-use)\n- [Environment Pod from YAML](#environment-pod-from-yaml)\n\n---\n\n# User guide\n\nHype lets you execute arbitrary JVM code in a distributed environment where different parts\nmight run concurrently in separate Docker containers, each using different amounts of memory,\nCPU and disk. With the help of Kubernetes and a cloud provider such as Google Cloud Platform,\nyou'll have dynamically scheduled resources available for your code to utilize.\n\nAll this might sound a bit abstract, so let's run through a concrete example. We'll be using Scala\nfor the examples, but all the core functionality is available from Java as well.\n\n## Dependency\n\nSBT\n\n```sbt\n\"com.spotify\" %% \"hype\" % \u003cversion\u003e\n```\n\n## Run functions\n\nIn order to run functions on the cluster, you'll have to set up a `Submitter` value.\nThe submitter encapsulates \"where\" to submit your functions.\n```scala\nval submitter = GkeSubmitter(\"gcp-project-id\", \"gce-zone-id\", \"gke-cluster-id\", \"gs://my-staging-bucket\")\n```\n\nFor testing, where you might want to run on a local Docker daemon, use `LocalSubmitter(...)`.\n\nWriting functions that can be executed with Hype is simple, just wrap them up as an `HFn[T]`. An\n`HFn[T]` is a closure that allows Hype to move the actual evaluation into a Docker container.\n\n```scala\ndef example(arg: String) = HFn[String] {\n  arg + \" world!\"\n}\n```\n\nIn the previous example, the default Hype Docker image (`spotify/hype`) is used. If you wish to use\nyour own image, you can easily do so:\n\n```scala\ndef example(arg: String) = HFn.withImage(\"us.gcr.io/my-image:42\") {\n  arg + \" world!\"\n}\n```\n\nNow we'll have to define the environment we want this function to run in.\n\n```scala\nval env = RunEnvironment()\n```\n\nFinally, use use the `Submitter` and `RunEnvironment` to execute an `HFn[T]`.\nWhen execution is complete, it'll return the function value back to your local context.\n\n```scala\nval result = submitter.submit(example(\"hello\"), env.withRequest(\"cpu\", \"750m\"))\n```\n\n## Full example\n\nThis is a full example that runs a simple function that executes an arbitrary command and lists all\nenvironment variables. It uses the Scala [sys.process] package to execute commands in the function.\nAlso see the [docs on how to create k8s secrets](https://kubernetes.io/docs/concepts/configuration/secret/#creating-your-own-secrets)\n\n```scala\nimport sys.process._\nimport com.spotify.hype._\n\n// A simple model for describing the runtime environment\ncase class EnvVar(name: String, value: String)\ncase class Res(cmdOutput: String, mounts: String, vars: List[EnvVar])\n\ndef extractEnv(cmd: String) = HFn[Res] {\n  val cmdOutput = cmd !!\n  val mounts = \"df -h\" !!\n  val vars = for ((key, value) \u003c- sys.env.toList)\n    yield EnvVar(key, value)\n\n  Res(cmdOutput, mounts, vars)\n}\n\nval submitter = GkeSubmitter(\"gcp-project-id\", \"gce-zone-id\", \"gke-cluster-id\", \"gs://my-staging-bucket\")\nval env = RunEnvironment()\n    .withSecret(\"gcp-key\", \"/etc/gcloud\") // a pre-created k8s secret volume named \"gcp-key\"\n\nval res = submitter.submit(extractEnv(\"uname -a\"), env)\n\nprintln(res.cmdOutput)\nprintln(res.mounts)\nres.vars.foreach(println)\n```\n\nThe `res.vars` list returned should contain the environment variables that were present in the\ndocker container while running on the cluster. Here's the output:\n\n```\n[info] Running HypeExample\n[info] 22:15:14.211 | INFO | StagingUtil |\u003e Uploading 69 files to staging location gs://my-staging-bucket to prepare for execution.\n[info] 22:15:51.057 | INFO | StagingUtil |\u003e Uploading complete: 4 files newly uploaded, 65 files cached\n[info] 22:15:51.673 | INFO | Submitter  |\u003e Submitting gs://my-staging-bucket/manifest-9vhb5u18.txt to RunEnvironment{base=RunEnvironment.SimpleBase{image=gcr.io/gcp-project-id/env-image}, secretMounts=[Secret{name=gcp-key, mountPath=/etc/gcloud}], volumeMounts=[], resourceRequests={}}\n[info] 22:15:52.221 | INFO | DockerRunner |\u003e Created pod hype-run-mymlbuw8\n[info] 22:15:52.351 | INFO | DockerRunner |\u003e Pod hype-run-mymlbuw8 assigned to node gke-hype-test-default-pool-e1122946-fg9k\n[info] 22:16:02.454 | INFO | DockerRunner |\u003e Kubernetes pod hype-run-mymlbuw8 exited with status Succeeded\n[info] 22:16:02.455 | INFO | DockerRunner |\u003e Got termination message: gs://my-staging-bucket/continuation-993467547293976140-eUWBfwL9J2tHvWuJw0lU3g-hype-run-mymlbuw8-return.bin\n[info] Linux hype-run-mymlbuw8 4.4.21+ #1 SMP Fri Feb 17 15:34:45 PST 2017 x86_64 GNU/Linux\n[info]\n[info] Filesystem      Size  Used Avail Use% Mounted on\n[info] overlay          95G  4.1G   91G   5% /\n[info] tmpfs           7.4G     0  7.4G   0% /dev\n[info] tmpfs           7.4G     0  7.4G   0% /sys/fs/cgroup\n[info] tmpfs           7.4G  4.0K  7.4G   1% /etc/gcloud\n[info] /dev/sda1        95G  4.1G   91G   5% /etc/hosts\n[info] tmpfs           7.4G   12K  7.4G   1% /run/secrets/kubernetes.io/serviceaccount\n[info] shm              64M     0   64M   0% /dev/shm\n[info]\n[info] EnvVar(HYPE_EXECUTION_ID,hype-run-mymlbuw8)\n[info] EnvVar(GOOGLE_APPLICATION_CREDENTIALS,/etc/gcloud/key.json)\n[info] EnvVar(HOSTNAME,hype-run-cv7cln6y)\n[info] EnvVar(PATH,/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)\n[info] EnvVar(JAVA_VERSION,8u121)\n[info] EnvVar(KUBERNETES_SERVICE_HOST,xx.xx.xx.xx)\n...\n```\n\n## Leveraging implicits\n\nIn order to save some keystrokes, you can use our `implicit` operators:\n```scala\nimport com.spotify.hype.magic._\n```\n\nNow you can set up an `implicit` `Submitter` value.\n```scala\nimplicit val submitter = GkeSubmitter(\"gcp-project-id\", \"gce-zone-id\", \"gke-cluster-id\", \"gs://my-staging-bucket\")\n```\n\nThe environment value can also be declared `implicit`,\nbut this is not required as it can explicitly be referenced when submitting functions.\n\n```scala\nimplicit val env = RunEnvironment().withSecret(\"gcp-key\", \"/etc/gcloud\")\n```\n\nFinally, use the `#!` (hashbang) operator to execute an `HFn[T]` in a given environment. It will\nuse the `Submitter` and `RunEnvironment` which should be in scope.\n\n```scala\nval result = example(\"hello\") #!\n```\n\nUsing an `implicit` value as we did above works in most cases, but the hashbang (`#!`)\noperator also allows you to specify an explicit environment.\n\n```scala\nval result = example(\"hello\") #! env.withRequest(\"cpu\", \"750m\")\n```\n## Custom environment images\n\nIn order for Hype to be able to execute functions in your custom Docker images, you'll have to\ninstall the `hype-run` command by adding the following to your `Dockerfile`:\n\n```dockerfile\n# Install hype-run command\nRUN /bin/sh -c \"$(curl -fsSL https://goo.gl/kSogpF)\"\nENTRYPOINT [\"hype-run\"]\n```\n\nIt is important to have exactly this `ENTRYPOINT` as the Kubernetes Pods will expect to run the\n`hype-run` command.\n\nSee example [`Dockerfile`](hype-docker/Dockerfile)\n\n# Process overview\n\nThis describes what Hype does from a high level point of view.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/spotify/hype/blob/master/doc/hype.png?raw=true\"\n       width=\"723\"\n       height=\"336\"/\u003e\n\u003c/p\u003e\n\n# Persistent disk\n\nHype makes it easy to schedule persistent disk volumes across different closures in a workflow.\nA typical pattern seen in many use cases is to first use a disk in read-write mode to download and\nprepare some data, and then fork out to several parallel tasks that use the disk in read-only mode.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/spotify/hype/blob/master/doc/hype-volumes.png?raw=true\"\n       width=\"406\"\n       height=\"213\"/\u003e\n\u003c/p\u003e\n\n## GCE Persistent Disk\n\nIn this example, we're using a StorageClass for [GCE Persistent Disk] that we've already set up on\nour cluster.\n\n```yaml\nkind: StorageClass\napiVersion: storage.k8s.io/v1beta1\nmetadata:\n  name: gce-ssd-pd\nprovisioner: kubernetes.io/gce-pd\nparameters:\n  type: pd-ssd\n```\n\nWe can then request volumes from this StorageClass using the Hype API:\n\n```scala\nimport sys.process._\nimport com.spotify.hype.magic._\n\nimplicit val submitter = GkeSubmitter(\"gcp-project-id\",\n                                      \"gce-zone-id\",\n                                      \"gke-cluster-id\",\n                                      \"gs://my-staging-bucket\")\n\n// Create a 10Gi volume from the 'gce-ssd-pd' storage class\nval ssd10Gi = TransientVolume(\"gce-ssd-pd\", \"10Gi\")\nval mount = \"/usr/share/volume\" \n\nval env = RunEnvironment()\nval readWriteEnv = env.withMount(ssd10Gi.mountReadWrite(mount))\nval readOnlyEnv = env.withMount(ssd10Gi.mountReadOnly(mount))\n\ndef write = HFn[Int] {\n  // get a random word and store it in the volume\n  s\"curl -so $mount/word http://www.setgetgo.com/randomword/get.php\" !\n}\n\ndef read = HFn[String] {\n  // read the word file\n  s\"cat $mount/word\" !!\n}\n\n// Write to the volume\nwrite #! readWriteEnv\n\n// Run 10 parallel functions that have read only access to the volume\nval results = for (_ \u003c- Range(0, 10).par)\n    yield read #! readOnlyEnv\n```\n\nThe submissions from the parallel range will each run concurrently in separate pods and have\nread-only access to the `/usr/share/volume` mount. The volume should contain the random word that\nwas written to it from the `write` function.\n\nCoordinating metadata and parameters across multiple submissions should be just as trivial as\npassing values from function calls as arguments to other functions.\n\n### Volume re-use\n\nBy default, the backing claim for a `TransientVolume` on Kubernetes is deleted when the JVM\nterminates.\n\nIf you wish to persist the Volume between invocations, you can use:\n\n```scala\nval disk = PersistentVolume(\"my-persistent-volume\", \"gce-ssd-pd\", \"10Gi\")\n```\n\nIf the volume does not exist, it will be created. Subsequent invocations will return use already\ncreated volume.\n\nThis is useful in use cases with larger volumes that take a significant amount of time to load,\nor when there's some sort of workflow orchestration around the Hype code that might run\ndifferent parts in separate JVM invocations.\n\n# Environment Pod from YAML\n\nSometimes more control over the Kubernetes Pod is desired. For these cases a regular Pod YAML file\ncan be used as a base for the `RunEnvironment`. Hype will still manage any used Volume Claims and\nmounts, but will leave all other details as you've specified them.\n\nHype will expect at least this field to be specified:\n\n- `spec.containers[name:hype-run]` - There must at least be a container named `hype-run`\n\nPlease note that the image field should *not* bet set (Hype requires each module to define its image).\n\n_Hype will override the `spec.containers[name:hype-run].args` field, so don't set it._\n\nHere's a minimal Pod YAML file with some custom settings, `./src/main/resources/pod.yaml`:\n\n```yaml\napiVersion: v1\nkind: Pod\n\nspec:\n  restartPolicy: Never # do not retry on failure\n\n  containers:\n  - name: hype-run\n    imagePullPolicy: Always # pull the image on each run\n\n    env: # additional environment variables\n    - name: EXAMPLE\n      value: my-env-value\n```\n\nAny resource requests added through the `RunEnvironment` API will merge with, and override the ones\nset in the YAML file.\n\nThen simply load your `RunEnvironment` through\n\n```scala\nval env = RunEnvironmentFromYaml(\"/pod.yaml\")\n```\n\n---\n\n_This project is in early development stages, expect anything you see to change._\n\n[Docker]: https://www.docker.com\n[Kubernetes]: https://kubernetes.io/\n[GCE Persistent Disk]: http://blog.kubernetes.io/2016/10/dynamic-provisioning-and-storage-in-kubernetes.html\n[sys.process]: http://www.scala-lang.org/api/rc2/scala/sys/process/package.html\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspotify%2Fhype","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspotify%2Fhype","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspotify%2Fhype/lists"}