Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pratikkalein/llm-cloud-run
Deploy Gemma on Google Cloud run with ollama
https://github.com/pratikkalein/llm-cloud-run
Last synced: about 10 hours ago
JSON representation
Deploy Gemma on Google Cloud run with ollama
- Host: GitHub
- URL: https://github.com/pratikkalein/llm-cloud-run
- Owner: pratikkalein
- License: apache-2.0
- Created: 2024-06-29T02:57:58.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-06-29T03:41:19.000Z (7 months ago)
- Last Synced: 2024-11-11T22:09:50.069Z (2 months ago)
- Language: Shell
- Homepage:
- Size: 8.79 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Run Gemma with Ollama on Cloud Run
This sample shows how to deploy the Ollama API with gemma:2b on Cloud Run, to run inference using CPU only.
**Gemma** is Google's open model built from the same research and technology used to create the Gemini models. The 2B version is the smallest version.
**Ollama** is a framework that makes it easy for developers to prototype apps with open models, including gemma. It comes with a REST API and this sample deploys that API with a Cloud Run service.
## Usage
To build the container with `gemma:2b` included and deploy the Ollama API to a publicly accessible URL on Cloud Run, use the following command from the directory ./run/ollama-gemma:
```
bash deploy.sh
```Respond to any prompts the command gives you. You might need to enable a few APIs
and choose a region to deploy to.Building the container takes roughly 3 minutes.
Once the command completes, the deploy command shows the public URL of the service. For convenience, store the URL in an environment variable:
```
URL=$(gcloud run services list --format "value(URL)" --filter metadata.name=ollama-gemma)
```## Explore the API
To display the list of available models, send a request to `api/tags`. This should list gemma:2b.
```
curl $URL/api/tags
```Ask Gemma a question:
```
curl $URL/api/generate -d \
'{
"model": "gemma:2b",
"prompt": "Why is the sky blue?"
}'
```The first request to a new instance will take some extra setup time because Gemma is loaded into memory. Ollama keeps the model in memory for 5 minutes. I search for a config parameter to change this setting but didn't find it yet.
For the full Ollama API, refer to [the API docs](https://github.com/ollama/ollama/blob/main/docs/api.md).
## Clean up
To clean up after following this short tutorial, you can do the following:
- In Artifact Registry, find the `cloud-run-source-deploy` repository and remove
the container image used by the Cloud Run service you created.
- In Cloud Run, delete the service you created.## Links
- https://github.com/google/gemma.cpp
- https://github.com/ollama/ollama---