{"id":15169645,"url":"https://github.com/prantlf/ovai","last_synced_at":"2025-06-23T08:38:41.970Z","repository":{"id":239497489,"uuid":"799607969","full_name":"prantlf/ovai","owner":"prantlf","description":"HTTP proxy for accessing Vertex AI with the REST API interface of ollama. Optionally forwarding requests for other models to ollama. Written in Go.","archived":false,"fork":false,"pushed_at":"2024-10-28T12:17:50.000Z","size":2902,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-10-30T01:51:46.402Z","etag":null,"topics":["ai","api-proxy","google","ollama","ollama-api","ollama-interface","vertex-ai","vertexai"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/prantlf.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-12T16:37:53.000Z","updated_at":"2024-10-28T12:17:55.000Z","dependencies_parsed_at":"2024-05-12T22:33:11.335Z","dependency_job_id":"2a5d2fc3-4434-4f85-b34f-aec1c704b75c","html_url":"https://github.com/prantlf/ovai","commit_stats":{"total_commits":48,"total_committers":2,"mean_commits":24.0,"dds":0.3125,"last_synced_commit":"e4bfc87b5884ef9396ec4c15bee3d2dc2e28e9e2"},"previous_names":["prantlf/ovai"],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prantlf%2Fovai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prantlf%2Fovai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prantlf%2Fovai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prantlf%2Fovai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/prantlf","download_url":"https://codeload.github.com/prantlf/ovai/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235515427,"owners_count":19002481,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","api-proxy","google","ollama","ollama-api","ollama-interface","vertex-ai","vertexai"],"created_at":"2024-09-27T07:04:14.850Z","updated_at":"2025-01-24T22:55:45.377Z","avatar_url":"https://github.com/prantlf.png","language":"Go","readme":"# ovai - ollama-vertex-ai\n\nHTTP proxy for accessing [Vertex AI] with the REST API interface of [ollama]. Optionally forwarding requests for other models to `ollama`. Written in [Go].\n\n## Synopsis\n\nGet embeddings for a text:\n\n```\n❯ curl localhost:22434/api/embed -d '{\n  \"model\": \"text-embedding-005\",\n  \"input\": \"Half-orc is the best race for a barbarian.\"\n}'\n\n{ \"embeddings\": [[0.05424513295292854, -0.023687424138188362, ...]] }\n```\n\n## Setup\n\n1. Download an archive with the executable for your hardware and operating system from [GitHub Releases].\n2. Download a JSON file with your Google account key from Google Project Console and save it to the current directory under the name `google-account.json`.\n3. Optionally create a file `model-defaults.json` in the current directory to change the [default model parameters].\n4. Run the server:\n\n```\n❯ ovai\n\nListening on http://localhost:22434 ...\n```\n\n### Configuring\n\nThe following properties from `google-account.json` are used:\n\n```jsonc\n{\n  \"project_id\": \"...\",\n  \"private_key_id\": \"...\",\n  \"private_key\": \"-----BEGIN PRIVATE KEY-----\\n...\\n-----END PRIVATE KEY-----\\n\",\n  \"client_email\": \"...\",\n  \"scope\": \"https://www.googleapis.com/auth/cloud-platform\", // optional, can be missing\n  \"auth_uri\": \"https://www.googleapis.com/oauth2/v4/token\"   // optional, can be missing\n}\n```\n\nSet the environment variable `PORT` to override the default port 22434.\n\nSet the environment variable `DEBUG` to one or more strings separated by commas to customise logging on `stderr`. The default value is `ovai` when run on the command line and `ovai:srv` inside the Docker container.\n\n| `DEBUG` value | What will be logged                                              |\n|:--------------|:-----------------------------------------------------------------|\n| `ovai`        | important information about the bodies of requests and responses |\n| `ovai:srv`    | methods and URLs of requests and status codes of responses       |\n| `ovai:net`    | requests forwarded to Vertex AI and received responses           |\n| `ovai,ovai:*` | all information above                                            |\n\nSet the environment variable `OLLAMA_ORIGIN` to the origin of the `ollama` service to enable forwarding to `ollama`. If the requested model doesn't start with `gemini`, `multimodalembedding`, `textembedding`  or `text-embedding`, the request will be forwarded to the `ollama` service. This can be used for using `ovai` as the single service with the `ollama` interface, which recognises both `Vertex AI` and `ollama` models.\n\nSet the environment variable `NETWORK` to enforce IPV4 or IPV6. The default behaviour is to depend on the [Happy Eyeballs] implementation in Go and in the underlying OS. valid values:\n\n| `NETWORK` value | What will be used                            |\n|:----------------|:---------------------------------------------|\n| `IPV4`          | enforce the network connection via IPV4 only |\n| `IPV6`          | enforce the network connection via IPV6 only |\n\n### Docker\n\nFor example, run a container for testing purposes with verbose logging, deleted on exit, exposing the port 22434:\n\n    docker run --rm -it -p 22434:22434 -e DEBUG=ovai,ovai:* \\\n      -v ${PWD}/google-account.json:/usr/src/app/google-account.json \\\n      ghcr.io/prantlf/ovai\n\nFor example, run a container named `ovai` in the background with custom defaults, forwarding to `ollama`, exposing the port 22434:\n\n    docker run --rm -dt -p 22434:22434 --name ovai \\\n      --add-host host.docker.internal:host-gateway \\\n      -e OLLAMA_ORIGIN=http://host.docker.internal:11434 \\\n      -v ${PWD}/google-account.json:/usr/src/app/google-account.json \\\n      -v ${PWD}/model-defaults.json:/usr/src/app/model-defaults.json \\\n      prantlf/ovai\n\nAnd the same task as above, only using Docker Compose (place [docker-compose.yml] or [docker-compose-ollama.yml], if you want to use ollama too, to the current directory) to make it easier:\n\n    docker-compose up -d --wait\n    docker-compose -f docker-compose-ollama.yml up -d --wait\n\nThe image is available as both `ghcr.io/prantlf/ovai` (GitHub) or `prantlf/ovai` (DockerHub).\n\n### Building\n\nMake sure that you have installed [Go] 1.22.3 or newer.\n\n    git clone https://github.com/prantlf/ovai.git\n    cd ovai\n    make\n\nExecuting `./ovai`, `make docker-start` or `make docker-up` will require the `google-account.json` file in the current directory, if you don't just proxy the calls to ollama (which needs the `OLLAMA_ORIGIN` environment variable).\n\n## API\n\nSee the original [REST API documentation] for details about the interface. See also the [lifecycle of the Vertex AI models].\n\n### Embeddings\n\nCreates a vectors from the specified input. See the available [embedding models].\n\n```\n❯ curl localhost:22434/api/embed -d '{\n  \"model\": \"textembedding-gecko@003\",\n  \"input\": [\"Half-orc is the best race for a barbarian.\"]\n}'\n\n{ \"embeddings\": [[0.05424513295292854, -0.023687424138188362, ...]] }\n```\n\nThe returned vector of floats has 768 dimensions.\n\nPrevious request remains supported for compatibility:\n\n```\n❯ curl localhost:22434/api/embeddings -d '{\n  \"model\": \"textembedding-gecko@003\",\n  \"prompt\": \"Half-orc is the best race for a barbarian.\"\n}'\n\n{ \"embedding\": [0.05424513295292854, -0.023687424138188362, ...] }\n```\n\n\n### Text\n\nGenerates a text using the specified prompt. See the available [gemini text and chat models].\n\n```\n❯ curl localhost:22434/api/generate -d '{\n  \"model\": \"gemini-1.5-flash-002\",\n  \"prompt\": \"Describe guilds from Dungeons and Dragons.\",\n  \"images\": [],\n  \"stream\": false\n}'\n\n{\n  \"model\": \"gemini-1.5-flash-002\",\n  \"created_at\": \"2024-05-10T14:10:54.885Z\",\n  \"response\": \"Guilds serve as organizations that bring together individuals with ...\",\n  \"done\": true,\n  \"total_duration\": 13884049373,\n  \"load_duration\": 0,\n  \"prompt_eval_count\": 7,\n  \"prompt_eval_duration: 3471012343,\n  \"eval_count: 557,\n  \"eval_duration: 10413037030\n}\n```\n\nThe property `stream` defaults to be `true`. The property `options` is optional with the following defaults:\n\n```\n\"options\": {\n  \"num_predict\": 8192,\n  \"temperature\": 1,\n  \"top_p\": 0.95,\n  \"top_k\": 40\n}\n```\n\n### Chat\n\nReplies to a chat with the specified message history. See the available [gemini text and chat models].\n\n```\n❯ curl localhost:22434/api/chat -d '{\n  \"model\": \"gemini-1.5-pro\",\n  \"messages\": [\n    {\n      \"role\": \"system\",\n      \"content\": \"You are an expert on Dungeons and Dragons.\"\n    },\n    {\n      \"role\": \"user\",\n      \"content\": \"What race is the best for a barbarian?\",\n      \"images\": []\n    }\n  ],\n  \"stream\": false\n}'\n\n{\n  \"model\": \"gemini-1.5-pro\",\n  \"created_at\": \"2024-05-06T23:32:05.219Z\",\n  \"message\": {\n    \"role\": \"assistant\",\n    \"content\": \"Half-Orcs are a strong and resilient race, making them ideal for barbarians. ...\"\n  },\n  \"done\": true,\n  \"total_duration\": 2325524053,\n  \"load_duration\": 0,\n  \"prompt_eval_count\": 9,\n  \"prompt_eval_duration: 581381013,\n  \"eval_count: 292,\n  \"eval_duration: 1744143040\n}\n```\n\nThe property `stream` defaults to `true`. The property `options` is optional with the following defaults:\n\n```\n\"options\": {\n  \"num_predict\": 8192,\n  \"temperature\": 1,\n  \"top_p\": 0.95,\n  \"top_k\": 40\n}\n```\n\n### Tags\n\nLists available models.\n\n```\n❯ curl localhost:22434/api/tags\n\n{\n  \"models\": [\n    {\n      \"name\": \"moondream:latest\",\n      \"model\": \"moondream:latest\",\n      \"modified_at\": \"2024-06-02T16:39:32.532400236+02:00\",\n      \"size\": 1738451197,\n      \"digest\": \"55fc3abd386771e5b5d1bbcc732f3c3f4df6e9f9f08f1131f9cc27ba2d1eec5b\",\n      \"details\": {\n        \"parent_model\": \"\",\n        \"format\": \"gguf\",\n        \"family\": \"phi2\",\n        \"families\": [\n          \"phi2\",\n          \"clip\"\n        ],\n        \"parameter_size\": \"1B\",\n        \"quantization_level\": \"Q4_0\"\n      },\n      \"expires_at\": \"0001-01-01T00:00:00Z\"\n    }\n  ]\n}\n```\n\n### Show\n\nShow information about a model.\n\n```\n❯ curl localhost:22434/api/show -d '{\"name\":\"moondream\"}'\n\n{\n  \"license\": \"....\",\n  \"modelfile\": \"...\",\n  \"parameters\": \"temperature 0\\nstop \\\"\\u003c|endoftext|\\u003e\\\"\\nstop \\\"Question:\\\"\",\n  \"template\": \"{{ if .Prompt }} Question: {{ .Prompt }}\\n\\n{{ end }} Answer: {{ .Response }}\\n\\n\",\n  \"details\": {\n    \"parent_model\": \"\",\n    \"format\": \"gguf\",\n    \"family\": \"phi2\",\n    \"families\": [\n      \"phi2\",\n      \"clip\"\n    ],\n    \"parameter_size\": \"1B\",\n    \"quantization_level\": \"Q4_0\"\n  }\n}\n```\n\n### Ping\n\nChecks that the server is running.\n\n```\n❯ curl -f localhost:22434/api/ping -X HEAD\n```\n\n### Shutdown\n\nGracefully shuts down the HTTP server and exits the process.\n\n```\n❯ curl localhost:22434/api/shutdown -X POST\n```\n\n## Models\n\n## Vertex AI\n\nRecognised models for embeddings: textembedding-gecko@001, textembedding-gecko@002, textembedding-gecko@003, textembedding-gecko-multilingual@001, text-multilingual-embedding-002, text-embedding-004, text-embedding-005, multimodalembedding@001.\n\nRecognised models for content generation and chat: gemini-2.0-flash-exp, gemini-1.5-flash-001, gemini-1.5-flash-002, gemini-1.5-flash-8b-001, gemini-1.5-pro-001, gemini-1.5-pro-002, gemini-1.0-pro-vision-001, gemini-1.0-pro-001, gemini-1.0-pro-002.\n\n### Ollama\n\nSmall models usable on machines with less memory and no AI accelerator:\n\n| Name             | Size   |\n|:-----------------|-------:|\n| gemma2:2b        | 1.6 GB |\n| granite3.1-dense:2b    | 1.5 GB |\n| granite3.1-moe:1b      | 2.0 GB |\n| granite3.1-moe:3b      | 1.4 GB |\n| granite-embedding:30m  |  63 MB |\n| granite-embedding:280m | 563 MB |\n| internlm2:1.8b   | 1.1 GB |\n| llama3.2:1b      | 1.3 GB |\n| llama3.2:3b      | 2.0 GB |\n| llava-phi3       | 2.9 GB |\n| moondream        | 1.7 GB |\n| nomic-embed-text | 274 MB |\n| orca-mini        | 2.0 GB |\n| phi              | 1.6 GB |\n| phi3             | 2.2 GB |\n| qwen2.5:0.5b     | 397 MB |\n| qwen2.5:1.5b     | 986 MB |\n| smollm           | 990 MB |\n| smollm:135m      | 91 MB  |\n| smollm:360m      | 229 MB |\n| snowflake-arctic-embed2 | 1.2 GB |\n| stablelm-zephyr  | 1.6 GB |\n| stablelm2        | 982 MB |\n| tinyllama        | 637 MB |\n\n#### gemma2\nGoogle Gemma 2 is a high-performing and efficient model available in three sizes: 2B, 9B, and 27B.\n\n#### granite3.1-dense\nThe IBM Granite 2B and 8B models are text-only dense LLMs trained on over 12 trillion tokens of data, demonstrated significant improvements over their predecessors in performance and speed in IBM’s initial testing. \n\n#### granite3.1-moe\nThe IBM Granite 1B and 3B models are long-context mixture of experts (MoE) Granite models from IBM designed for low latency usage. \n\n#### granite-embedding\nThe IBM Granite Embedding 30M and 278M models models are text-only dense biencoder embedding models, with 30M available in English only and 278M serving multilingual use cases.\n\n#### internlm2\nInternLM2.5 is a 7B parameter model tailored for practical scenarios with outstanding reasoning capability.\n\n#### llama3.2\nMeta's Llama 3.2 goes small with 1B and 3B models.\n\n#### llava-phi3\nA new small LLaVA model fine-tuned from Phi 3 Mini.\n\n#### moondream\nmoondream2 is a small vision language model designed to run efficiently on edge devices.\n\n#### nemotron-mini\nA commercial-friendly small language model by NVIDIA optimized for roleplay, RAG QA, and function calling.\n\n#### nomic-embed-text\nA high-performing open embedding model with a large token context window.\n\n#### nuextract\nA 3.8B model fine-tuned on a private high-quality synthetic dataset for information extraction, based on Phi-3.\n\n#### orca-mini\nA general-purpose model ranging from 3 billion parameters to 70 billion, suitable for entry-level hardware.\n\n#### phi\nPhi-2: a 2.7B language model by Microsoft Research that demonstrates outstanding reasoning and language understanding capabilities.\n\n#### phi3\nPhi-3 is a family of lightweight 3B (Mini) and 14B (Medium) state-of-the-art open models by Microsoft.\n\n#### qwen\nQwen 1.5 is a series of large language models by Alibaba Cloud spanning from 0.5B to 110B parameters\n\n#### qwen2\nQwen2 is a new series of large language models from Alibaba group\n\n#### qwen2.5\nQwen2.5 models are pretrained on Alibaba's latest large-scale dataset, encompassing up to 18 trillion tokens. The model supports up to 128K tokens and has multilingual support.\n\n#### smollm\n🪐 A family of small models with 135M, 360M, and 1.7B parameters, trained on a new high-quality dataset.\n\n#### snowflake-arctic-embed2\nSnowflake's frontier embedding model. Arctic Embed 2.0 adds multilingual support without sacrificing English performance or scalability. \n\n#### stablelm-zephyr\nA lightweight chat model allowing accurate, and responsive output without requiring high-end hardware.\n\n#### stablelm2\nStable LM 2 is a state-of-the-art 1.6B and 12B parameter language model trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch.\n\n#### tinydolphin\nAn experimental 1.1B parameter model trained on the new Dolphin 2.8 dataset by Eric Hartford and based on TinyLlama.\n\n#### tinyllama\nThe TinyLlama project is an open endeavor to train a compact 1.1B Llama model on 3 trillion tokens.\n\n## Contributing\n\nIn lieu of a formal styleguide, take care to maintain the existing coding style. Lint and test your code.\n\n## License\n\nCopyright (C) 2024-2025 Ferdinand Prantl\n\nLicensed under the [MIT License].\n\n[MIT License]: http://en.wikipedia.org/wiki/MIT_License\n[Vertex AI]: https://cloud.google.com/vertex-ai\n[ollama]: https://ollama.com\n[GitHub Releases]: https://github.com/prantlf/ovai/releases/\n[Go]: https://go.dev\n[default model parameters]: ./model-defaults.json\n[Happy Eyeballs]: https://en.wikipedia.org/wiki/Happy_Eyeballs\n[docker-compose.yml]: ./docker-compose.yml\n[docker-compose-ollama.yml]: ./docker-compose-ollama.yml\n[REST API documentation]: https://github.com/ollama/ollama/blob/main/docs/api.md\n[lifecycle of the Vertex AI models]: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versioning\n[embedding models]: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings#model_versions\n[gemini text and chat models]: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gemini#model_versions\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprantlf%2Fovai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprantlf%2Fovai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprantlf%2Fovai/lists"}