https://github.com/ollama/ollama-python
Ollama Python library
https://github.com/ollama/ollama-python
ollama python
Last synced: 16 days ago
JSON representation
Ollama Python library
- Host: GitHub
- URL: https://github.com/ollama/ollama-python
- Owner: ollama
- License: mit
- Created: 2023-12-09T09:27:18.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-05-06T22:34:54.000Z (8 months ago)
- Last Synced: 2025-05-12T02:37:33.232Z (8 months ago)
- Topics: ollama, python
- Language: Python
- Homepage: https://ollama.com
- Size: 424 KB
- Stars: 7,520
- Watchers: 53
- Forks: 680
- Open Issues: 100
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
- jimsghstars - ollama/ollama-python - Ollama Python library (Python)
- StarryDivineSky - ollama/ollama-python
- AiTreasureBox - ollama/ollama-python - 11-03_8795_0](https://img.shields.io/github/stars/ollama/ollama-python.svg)|Ollama Python library| (Repos)
- stars - ollama-python
README
# Ollama Python Library
The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with [Ollama](https://github.com/ollama/ollama).
## Prerequisites
- [Ollama](https://ollama.com/download) should be installed and running
- Pull a model to use with the library: `ollama pull ` e.g. `ollama pull gemma3`
- See [Ollama.com](https://ollama.com/search) for more information on the models available.
## Install
```sh
pip install ollama
```
## Usage
```python
from ollama import chat
from ollama import ChatResponse
response: ChatResponse = chat(model='gemma3', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])
print(response['message']['content'])
# or access fields directly from the response object
print(response.message.content)
```
See [_types.py](ollama/_types.py) for more information on the response types.
## Streaming responses
Response streaming can be enabled by setting `stream=True`.
```python
from ollama import chat
stream = chat(
model='gemma3',
messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
stream=True,
)
for chunk in stream:
print(chunk['message']['content'], end='', flush=True)
```
## Cloud Models
Run larger models by offloading to Ollama’s cloud while keeping your local workflow.
- Supported models: `deepseek-v3.1:671b-cloud`, `gpt-oss:20b-cloud`, `gpt-oss:120b-cloud`, `kimi-k2:1t-cloud`, `qwen3-coder:480b-cloud`, `kimi-k2-thinking` See [Ollama Models - Cloud](https://ollama.com/search?c=cloud) for more information
### Run via local Ollama
1) Sign in (one-time):
```
ollama signin
```
2) Pull a cloud model:
```
ollama pull gpt-oss:120b-cloud
```
3) Make a request:
```python
from ollama import Client
client = Client()
messages = [
{
'role': 'user',
'content': 'Why is the sky blue?',
},
]
for part in client.chat('gpt-oss:120b-cloud', messages=messages, stream=True):
print(part.message.content, end='', flush=True)
```
### Cloud API (ollama.com)
Access cloud models directly by pointing the client at `https://ollama.com`.
1) Create an API key from [ollama.com](https://ollama.com/settings/keys) , then set:
```
export OLLAMA_API_KEY=your_api_key
```
2) (Optional) List models available via the API:
```
curl https://ollama.com/api/tags
```
3) Generate a response via the cloud API:
```python
import os
from ollama import Client
client = Client(
host='https://ollama.com',
headers={'Authorization': 'Bearer ' + os.environ.get('OLLAMA_API_KEY')}
)
messages = [
{
'role': 'user',
'content': 'Why is the sky blue?',
},
]
for part in client.chat('gpt-oss:120b', messages=messages, stream=True):
print(part.message.content, end='', flush=True)
```
## Custom client
A custom client can be created by instantiating `Client` or `AsyncClient` from `ollama`.
All extra keyword arguments are passed into the [`httpx.Client`](https://www.python-httpx.org/api/#client).
```python
from ollama import Client
client = Client(
host='http://localhost:11434',
headers={'x-some-header': 'some-value'}
)
response = client.chat(model='gemma3', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])
```
## Async client
The `AsyncClient` class is used to make asynchronous requests. It can be configured with the same fields as the `Client` class.
```python
import asyncio
from ollama import AsyncClient
async def chat():
message = {'role': 'user', 'content': 'Why is the sky blue?'}
response = await AsyncClient().chat(model='gemma3', messages=[message])
asyncio.run(chat())
```
Setting `stream=True` modifies functions to return a Python asynchronous generator:
```python
import asyncio
from ollama import AsyncClient
async def chat():
message = {'role': 'user', 'content': 'Why is the sky blue?'}
async for part in await AsyncClient().chat(model='gemma3', messages=[message], stream=True):
print(part['message']['content'], end='', flush=True)
asyncio.run(chat())
```
## API
The Ollama Python library's API is designed around the [Ollama REST API](https://github.com/ollama/ollama/blob/main/docs/api.md)
### Chat
```python
ollama.chat(model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
```
### Generate
```python
ollama.generate(model='gemma3', prompt='Why is the sky blue?')
```
### List
```python
ollama.list()
```
### Show
```python
ollama.show('gemma3')
```
### Create
```python
ollama.create(model='example', from_='gemma3', system="You are Mario from Super Mario Bros.")
```
### Copy
```python
ollama.copy('gemma3', 'user/gemma3')
```
### Delete
```python
ollama.delete('gemma3')
```
### Pull
```python
ollama.pull('gemma3')
```
### Push
```python
ollama.push('user/gemma3')
```
### Embed
```python
ollama.embed(model='gemma3', input='The sky is blue because of rayleigh scattering')
```
### Embed (batch)
```python
ollama.embed(model='gemma3', input=['The sky is blue because of rayleigh scattering', 'Grass is green because of chlorophyll'])
```
### Ps
```python
ollama.ps()
```
## Errors
Errors are raised if requests return an error status or if an error is detected while streaming.
```python
model = 'does-not-yet-exist'
try:
ollama.chat(model)
except ollama.ResponseError as e:
print('Error:', e.error)
if e.status_code == 404:
ollama.pull(model)
```