https://github.com/pikocloud/pikobrain

Function-calling API for LLM from multiple providers
https://github.com/pikocloud/pikobrain
api aws-bedrock function-calling gemini llm-server ollama openai rag
Last synced: 8 months ago
JSON representation
Function-calling API for LLM from multiple providers
Host: GitHub
URL: https://github.com/pikocloud/pikobrain
Owner: pikocloud
License: mpl-2.0
Created: 2024-08-04T16:49:00.000Z (about 1 year ago)
Default Branch: master
Last Pushed: 2024-08-10T14:08:54.000Z (about 1 year ago)
Last Synced: 2025-01-31T06:51:18.307Z (9 months ago)
Topics: api, aws-bedrock, function-calling, gemini, llm-server, ollama, openai, rag
Language: Go
Homepage:
Size: 408 KB
Stars: 5
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project

README

          # PikoBrain

PikoBrain is function-calling API for LLM from multiple providers.

The key project features:

- allows you to define model configuration

- provides universal API regardless of LLM

- provides actual function calling (currently OpenAPI)

- (optionally) supports different models for Vision and text

- Basic UI

It allows set functions (RAG) without vendor lock-in.

The project LICENSED under MPL-2.0 Exhibit A which promotes collaboration (requires sharing changes) but does not

restrict for commercial or any other usage.

## Roadmap

Providers

- [x] [OpenAI](#openai)

- [x] [AWS Bedrock](#aws-bedrock)

- [x] [Ollama](#ollama)

- [x] [Google](#google)

State

- [x] [Threads](#threads)

Integration

- [ ] Webhooks

- [ ] NATS Notifications

Functions

- [x] OpenAPI (including automatic reload)

- [ ] Internal functions (threads)

- [ ] Scripting functions

Libraries

- [ ] Python

- [ ] Golang

- [ ] Typescript

## Installation

- Source (requires go 1.22.5+) `go run github.com/pikocloud/pikobrain@latest `

- Binary in [releases](https://github.com/pikocloud/pikobrain/releases/latest)

- Docker `ghcr.io/pikocloud/pikobrain`

## Usage

Binary

    pikobrain --config examples/brain.yaml --tools examples/tools.yaml

Docker

    docker run --rm -v $(pwd):/data -v $(pwd)/examples:/config:ro -p 8080:8080 ghcr.io/pikocloud/pikobrain

- Define model and tools like in [examples/](examples/)

- Run service

- Call service

**Basic UI**

    http://127.0.0.1:8080

![Screenshot from 2024-08-10 20-13-32](https://github.com/user-attachments/assets/9d5b8ab6-0c14-45d0-ae69-face46517a56)

> [!NOTE]  

> UI designed primarily for admin tasks. For user-friendly chat experience use something

> like [LibreChat](https://github.com/danny-avila/LibreChat)

**Request**

    POST http://127.0.0.1:8080

Input can be:

- `multipart/form-data payload` (preferred), where:

    - each part can be text/plain (default if not set), application/x-www-form-urlencoded, application/json, image/png,

      image/jpeg, image/webp, image/gif

    - may contain header `X-User` in each part which maps to user field in providers

    - may contain header `X-Role` where values could be `user` (default) or `assistant`

    - multipart name doesn't matter

- `application/x-www-form-urlencoded`; content will be decoded

- `text/plain`, `application/json`

- `image/png`, `image/jpeg`, `image/webp`, `image/gif`

- without content type, then payload should be valid UTF-8 string and will be used as single payload

> Request may contain query parameter `user` which maps to user field and/or query `role` (user or assistant)

Multipart payload allows caller provide full history context messages. For multipart, header `X-User` and `X-Role` may

override query parameters.

Output is the response from LLM.

> [!INFO]  

> User field is not used for inference. Only for audit.

## Threads

In addition to normal [usage](#usage), it's possible to use stateful chat context within "thread".

For every request historical questions will be fetched (up to `depth`).

**Add and run**

    POST http://127.0.0.1:8080/

Content can be empty (just run)

**Just add**

    PUT http://127.0.0.1:8080/

### Clients

Python with aiohttp

```python3

import asyncio

import io

from dataclasses import dataclass

from datetime import timedelta

from typing import Literal, Iterable

import aiohttp

@dataclass(frozen=True, slots=True)

class Message:

    content: str | bytes | io.BytesIO

    mime: str | None = None

    role: Literal['assistant', "user"] | None = None

    user: str | None = None

@dataclass(frozen=True, slots=True)

class Response:

    content: bytes

    mime: str

    duration: timedelta

    input_messages: int

    input_tokens: int

    output_tokens: int

    total_tokens: int

async def request(url: str, messages: Iterable[Message]) -> Response:

    with aiohttp.MultipartWriter('form-data') as mpwriter:

        for message in messages:

            headers = {}

            if message.mime:

                headers[aiohttp.hdrs.CONTENT_TYPE] = message.mime

            if message.role:

                headers['X-Role'] = message.role

            if message.user:

                headers['X-User'] = message.user

            mpwriter.append(message.content, headers)

        async with aiohttp.ClientSession() as session, session.post(url, data=mpwriter) as res:

            assert res.ok, await res.text()

            return Response(

                content=await res.read(),

                mime=res.headers.get(aiohttp.hdrs.CONTENT_TYPE),

                duration=timedelta(seconds=float(res.headers.get('X-Run-Duration'))),

                input_messages=int(res.headers.get('X-Run-Context')),

                input_tokens=int(res.headers.get('X-Run-Input-Tokens')),

                output_tokens=int(res.headers.get('X-Run-Output-Tokens')),

                total_tokens=int(res.headers.get('X-Run-Total-Tokens')),

            )

async def example():

    res = await request('http://127.0.0.1:8080', messages=[

        Message("My name is RedDec. You name is Bot."),

        Message("What is your and my name?"),

    ])

    print(res)

```

#### cURL

Simple

    curl --data 'Why sky is blue?' http://127.0.0.1:8080

Text multipart

    curl -F '_=my name is RedDec' -F '_=What is my name?' -v http://127.0.0.1:8080

Image and text

    curl -F '_=@eifeltower.jpeg' -F '_=Describe the picture' -v http://127.0.0.1:8080

## CLI

```

Application Options:

      --timeout=                  LLM timeout (default: 30s) [$TIMEOUT]

      --refresh=                  Refresh interval for tools (default: 30s) [$REFRESH]

      --config=                   Config file (default: brain.yaml) [$CONFIG]

      --tools=                    Tool file [$TOOLS]

Debug:

      --debug.enable              Enable debug mode [$DEBUG_ENABLE]

Database configuration:

      --db.url=                   Database URL (default: sqlite://data.sqlite?cache=shared&_fk=1&_pragma=foreign_keys(1)) [$DB_URL]

      --db.max-conn=              Maximum number of opened connections to database (default: 10) [$DB_MAX_CONN]

      --db.idle-conn=             Maximum number of idle connections to database (default: 1) [$DB_IDLE_CONN]

      --db.idle-timeout=          Maximum amount of time a connection may be idle (default: 0) [$DB_IDLE_TIMEOUT]

      --db.conn-life-time=        Maximum amount of time a connection may be reused (default: 0) [$DB_CONN_LIFE_TIME]

HTTP server configuration:

      --http.bind=                Bind address (default: :8080) [$HTTP_BIND]

      --http.tls                  Enable TLS [$HTTP_TLS]

      --http.ca=                  Path to CA files. Optional unless IGNORE_SYSTEM_CA set (default: ca.pem) [$HTTP_CA]

      --http.cert=                Server certificate (default: cert.pem) [$HTTP_CERT]

      --http.key=                 Server private key (default: key.pem) [$HTTP_KEY]

      --http.mutual               Enable mutual TLS [$HTTP_MUTUAL]

      --http.ignore-system-ca     Do not load system-wide CA [$HTTP_IGNORE_SYSTEM_CA]

      --http.read-header-timeout= How long to read header from the request (default: 3s) [$HTTP_READ_HEADER_TIMEOUT]

      --http.graceful=            Graceful shutdown timeout (default: 5s) [$HTTP_GRACEFUL]

      --http.timeout=             Any request timeout (default: 30s) [$HTTP_TIMEOUT]

      --http.max-body-size=       Maximum payload size in bytes (default: 1048576) [$HTTP_MAX_BODY_SIZE]

```

## Providers

### OpenAI

First-class support, everything works just fine.

### Google

Good support. Known limitations:

- date-time not supported in tools

- empty object (aka any JSON) is not supported

- for complex schemas, `gemini-1.5-flash` may hallucinate and call with incorrect arguments. Use `gemini-1.5-pro`

### Ollama

Requires Ollama 0.3.3+

Recommended model: `llava` for vision and `mistral:instruct` for general messages (including function calling).

```yaml

model: 'mistral:instruct'

vision:

  model: 'llava'

```

> [!TIP]  

> Check https://ollama.com/library for models with 'tools' and 'vision' features. The bigger model then generally

> better.

> For non-vision models, `instruct` kind usually better.

### AWS Bedrock

> [!WARNING]  

> Due to multiple limitations, only Claude 3+ models are working properly. Recommended multi-modal model for AWS

> Bedrock is Anthropic Claude-3-5.

Initial support.

- Some models may not support system prompt.

- Some models may not support tools.

- Authorization is ignored (use AWS environment variables)

- `forceJSON` is not supported (workaround: use tools)

Required minimal set of environment variables

    AWS_ACCESS_KEY_ID=

    AWS_SECRET_ACCESS_KEY=

    AWS_REGION=

Please refer

to [AWS Environment variable cheatsheet](https://docs.aws.amazon.com/sdkref/latest/guide/settings-reference.html#EVarSettings)

for configuration.

Based on [function calling feature](https://docs.aws.amazon.com/bedrock/latest/userguide/tool-use.html) the recommended

models are:

- Anthropic Claude 3 models

- Mistral AI Mistral Large and Mistral Small

- Cohere Command R and Command R+

See list of [compatibilities](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pikocloud/pikobrain

Awesome Lists containing this project

README