Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pratikkalein/llm-cloud-run

Deploy Gemma on Google Cloud run with ollama
https://github.com/pratikkalein/llm-cloud-run

Last synced: about 10 hours ago
JSON representation

Deploy Gemma on Google Cloud run with ollama

Host: GitHub
URL: https://github.com/pratikkalein/llm-cloud-run
Owner: pratikkalein
License: apache-2.0
Created: 2024-06-29T02:57:58.000Z (7 months ago)
Default Branch: main
Last Pushed: 2024-06-29T03:41:19.000Z (7 months ago)
Last Synced: 2024-11-11T22:09:50.069Z (2 months ago)
Language: Shell
Homepage:
Size: 8.79 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

README

# Run Gemma with Ollama on Cloud Run

This sample shows how to deploy the Ollama API with gemma:2b on Cloud Run, to run inference using CPU only.

**Gemma** is Google's open model built from the same research and technology used to create the Gemini models. The 2B version is the smallest version.

**Ollama** is a framework that makes it easy for developers to prototype apps with open models, including gemma. It comes with a REST API and this sample deploys that API with a Cloud Run service.

## Usage

To build the container with `gemma:2b` included and deploy the Ollama API to a publicly accessible URL on Cloud Run, use the following command from the directory ./run/ollama-gemma:

```
bash deploy.sh
```

Respond to any prompts the command gives you. You might need to enable a few APIs
and choose a region to deploy to.

Building the container takes roughly 3 minutes.

Once the command completes, the deploy command shows the public URL of the service. For convenience, store the URL in an environment variable:

```
URL=$(gcloud run services list --format "value(URL)" --filter metadata.name=ollama-gemma)
```

## Explore the API

To display the list of available models, send a request to `api/tags`. This should list gemma:2b.

```
curl $URL/api/tags
```

Ask Gemma a question:

```
curl $URL/api/generate -d \
'{
"model": "gemma:2b",
"prompt": "Why is the sky blue?"
}'
```

The first request to a new instance will take some extra setup time because Gemma is loaded into memory. Ollama keeps the model in memory for 5 minutes. I search for a config parameter to change this setting but didn't find it yet.

For the full Ollama API, refer to [the API docs](https://github.com/ollama/ollama/blob/main/docs/api.md).

## Clean up

To clean up after following this short tutorial, you can do the following:

- In Artifact Registry, find the `cloud-run-source-deploy` repository and remove
the container image used by the Cloud Run service you created.
- In Cloud Run, delete the service you created.

## Links

- https://github.com/google/gemma.cpp
- https://github.com/ollama/ollama

---