https://github.com/empower-ai/empower-functions

GPT-4 level function calling models for real-world tool using use cases
https://github.com/empower-ai/empower-functions
ai function-calling llama3 llm mixtral
Last synced: 3 months ago
JSON representation
GPT-4 level function calling models for real-world tool using use cases
Host: GitHub
URL: https://github.com/empower-ai/empower-functions
Owner: empower-ai
Created: 2024-05-10T20:47:31.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-10-05T00:19:19.000Z (9 months ago)
Last Synced: 2025-04-01T15:12:20.781Z (3 months ago)
Topics: ai, function-calling, llama3, llm, mixtral
Language: Python
Homepage: https://www.empower.dev
Size: 1.28 MB
Stars: 229
Watchers: 2
Forks: 19
Open Issues: 1
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

awesome-ChatGPT-repositories - empower-functions - GPT-4 level function calling models for real-world tool using use cases (NLP)
awesome_ai_agents - Empower-Functions - GPT-4 level function calling models for real-world tool using use cases (Building / Tools)
awesome_ai_agents - Empower-Functions - GPT-4 level function calling models for real-world tool using use cases (Building / LLM Models)
README

        
  

# Empower Functions

Empower Functions is a family of LLMs(large language models) that offer GPT-4 level capabilities for real-world "tool using" use cases, with full compatibility support to be served as a drop-in replacement.

[Live Demo](https://app.empower.dev/chat-demo) • [Huggingface Repo](https://huggingface.co/collections/empower-dev/empower-functions-v11-66df72d78c1f7b80bda36f5f) • [Website](https://empower.dev) • [Discord](https://discord.gg/PVaggZ3z6r)

## Update

**New Empower Functions v1.1** 

We have just launched new [v1.1 of the Empower Functions family](https://huggingface.co/collections/empower-dev/empower-functions-v11-66df72d78c1f7b80bda36f5f). The updated v1.1 family has been fine-tuned based on Llama3.1 using an enhanced curated dataset. It has achieved state-of-the-art performance on the Berkeley Function Calling leader board:

![image](assets/bfcl.png)

(captured on Sep 10, 2024)

## What is real world "tool using" use cases?

"tool using" refers to the ability of LLMs to interact with external APIs by recognizing when a function needs to be called and then generating JSON containing the necessary arguments based on user inputs. This capability is essential for building conversational agents and applications that convert natural language into API calls, facilitating tasks such as weather inquiries, data extraction, and interactions with knowledge bases.

Real-world use cases, particularly those involving conversational agents, often introduce complex requirements for LLMs. Models must be capable of retrieving context from multiple round of conversations([multi-turn](docs/inference/multi-turn.md)), choosing between utilizing tools or engaging in standard dialogue (['auto' mode](docs/inference/introduction.md#tools-parameter)), and asking for clarification if any parameters are missing([clarification](docs/inference/clarification.md)). Furthermore, they should integrate responses with tool outputs in a [streaming](docs/inference/streaming.md) fashion. Additionally, when multiple tools are required to complete a task, models should efficiently execute multiple functions either in parallel ([parallel calling](docs/inference/parallel-calling.md)) or sequentially with dependencies ([sequential calling](docs/inference/sequential-calling.md)).

For example, below is a screenshot demonstrating how the model is used in a medical center coordinator bot. You can explore this further in our [live demo](https://app.empower.dev/chat-demo).

![image](assets/demo_screenshot.png)

## Family of Models

| Model                          | Specs                                                                                             | Links                                                                                                                                                      | Notes                                 |

| ------------------------------ | ------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------- |

| llama3-empower-functions-small | 128k context, based on [Llama3.1 8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)               | [model](https://huggingface.co/empower-dev/llama3-empower-functions-small-v1.1), [gguf](https://huggingface.co/empower-dev/llama3-empower-functions-small-gguf-v1.1) | Most cost-effective, locally runnable |                                                                                    | Balance in accuracy and cost          |

| llama3-empower-functions-large | 128k context, based on [Llama3.1 70B](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B)             | [model](https://huggingface.co/empower-dev/llama3-empower-functions-large-v1.1)                                                                                 | Best accuracy                         |

#### Hardware Requirement

We have tested and the family of models in following setup:

- empower-functions-small: fp16 on 1xA100 40G, GGUF and 4bit GGUF on Macbook M2 Pro with 32G RAM, in minimal the 4bit GGUF version requires 7.56G RAM.

- empower-functions-large: fp16 on 4xA100 80G

## How to Use?

#### Running Locally

> Running locally is only supported by the `llama3-empower-functions-small` model. To use other models, please use our API.

Local running is supported through the `empower_functions` pip package, make sure you install it first by running `pip install empower-functions`.

> If you encounter errors like RuntimeError: Failed to load shared library, (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')), please re-install the llama-cpp-python package by running `CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DLLAMA_METAL=on" pip install --upgrade --verbose --force-reinstall --no-cache-dir llama-cpp-python`

Running a Local OpenAI Compatible Server

We leverage the `llama-cpp-python` project to run the model locally. To start a local OpenAI compatible server, you'll need to follow the steps below:

1. Download the GGUF model from our [huggingface repo](TODO)

2. Run the command `python -m empower_functions.server --model  --chat_format empower-functions`

You should see the following output when the server is ready:

`INFO:     Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)`

Then you can use the OpenAI SDK to connect to the server. See below for a basic example:

```python

import openai

import json

client = openai.OpenAI(

    base_url = "http://localhost:8000/v1",

    api_key = "YOUR_API_KEY"

)

messages = [

    {"role": "user", "content": "What's the weather in San Francisco?"}

]

tools = [

    {

        "type": "function",

        "function": {

            "name": "get_current_weather",

            "description": "Get the current weather",

            "parameters": {

                "type": "object",

                "properties": {

                    "location": {

                        "type": "string",

                        "description": "The city and state, e.g., San Francisco, CA"

                    }

                },

                "required": ["location"]

            }

        }

    }

]

chat_completion = client.chat.completions.create(

    model="does_not_matter",

    messages=messages,

    tools=tools,

    temperature=0,

    tool_choice="auto"

)

print(chat_completion)

```

Running in a Python Environment

You can directly call the model in your python environment through the `llama-cpp-python` package with the chat handler provided in the `empower_functions` package. See below for a basic example. For more detailed example, please refer to the [python script](https://github.com/empower-ai/empower-functions/blob/main/examples/llama_cpp_inference.py).

```python

import json

from empower_functions import EmpowerFunctionsCompletionHandler

from llama_cpp.llama_tokenizer import LlamaHFTokenizer

from llama_cpp import Llama

llm = Llama.from_pretrained(

    repo_id="empower-dev/llama3-empower-functions-small-gguf",

    filename="ggml-model-Q4_K_M.gguf",

    chat_format="llama-3",

    chat_handler=EmpowerFunctionsCompletionHandler(),

    tokenizer=LlamaHFTokenizer.from_pretrained("empower-dev/llama3-empower-functions-small-gguf"),

    n_gpu_layers=0

)

# You can then use the llm object to chat with the model

messages = [

    {"role": "user", "content": "What's the weather in San Francisco?"}

]

tools = [

    {

        "type": "function",

        "function": {

            "name": "get_current_weather",

            "description": "Get the current weather",

            "parameters": {

                "type": "object",

                "properties": {

                    "location": {

                        "type": "string",

                        "description": "The city and state, e.g., San Francisco, CA"

                    }

                },

                "required": ["location"]

            }

        }

    }

]

result = llm.create_chat_completion(

      messages = messages,

      tools=tools,

      tool_choice="auto",

      max_tokens=128

)

print(json.dumps(result["choices"][0], indent=2))

```

Running in Windows with Cuda

* install Nvidia toolkit (I used cuda 12.1): https://developer.nvidia.com/cuda-12-1-1-download-archive?target_os=Windows&target_arch=x86_64&target_version=11&target_type=exe_local

* install Visual Studio with:

C++ CMake tools for Windows.

C++ core features

* run this command with the empower_functions virtual environment active in the Windows command prompt (command prompt, not PowerShell):

`set FORCE_CMAKE=1 && set CMAKE_ARGS=-DGGML_CUDA=on -DLLAMA_AVX=off -DLLAMA_AVX2=off -DLLAMA_FMA=off && pip install llama-cpp-python --no-cache-dir --force-reinstall --verbose`

That will take awhile but will overwrite the normal llama-cpp-python module with the Cuda support one.

* then run the server with the virtual environment active with a command like this:

`python -m empower_functions.server --model C:\Github\empower-functions-gpu\models\ggml-model-Q4_K_M.gguf --chat_format empower-functions --port 8001 --n_ctx 8196 --n_gpu_layers 20`

replacing the path with the path where the model is saved on your computer and adjusting n_ctx to the desired context and n_gpu_layers to the amount of the layers to offload to the GPU.

#### Using Empower API

The empower platform offers an API that is fully compatible with the OpenAI API, allowing you to directly use the OpenAI SDK. An example is shown below. See below for a basic example, more details can be found [here](/docs/inference/introduction.md).

Currently streaming and JSON model is only available in Empower API.

```python

from openai import OpenAI

client = OpenAI(

    base_url="https://app.empower.dev/api/v1",

    api_key="YOU_API_KEY"

)

response = client.chat.completions.create(

    model="empower-functions",

    messages=[{"role": "user",

               "content": "What's the weather in San Francisco and Los Angeles in Celsius?"}],

    temperature=0,

    tools=[{

        "type": "function",

        "function": {

            "name": "get_current_weather",

            "description": "Get the current weather in a given location",

            "parameters": {

                "type": "object",

                "properties": {

                    "location": {

                        "type": "string",

                        "description": "The city and state, e.g. San Francisco, CA",

                    },

                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},

                },

                "required": ["location"],

            },

        },

    }],

)

response_message = response.choices[0].message.tool_calls

print(response_message)

```

#### Prompt Raw Model

The Empower functions model family has been tuned to natively produce JSON. We provide utilities in our Python package to prompt OpenAI-formatted messages. See below for a basic example, more details can be found [here](/docs/model-prompt.md).

```python

from transformers import AutoModelForCausalLM, AutoTokenizer

from prompt import prompt_messages

device = "cuda"

model_path = 'empower-dev/empower-functions-small'

model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")

tokenizer = AutoTokenizer.from_pretrained(model_path)

functions = [

    {

        "name": "get_current_weather",

        "description": "Get the current weather in a given location",

        "parameters": {

                "type": "object",

                "properties": {

                    "location": {

                        "type": "string",

                        "description": "The city and state, e.g. San Francisco, CA",

                    },

                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},

                },

            "required": ["location"],

        },

    }

]

messages = [

    {'role': 'user', 'content': 'What\'s the weather in San Francisco and Los Angles in Celsius?'},

]

messages = prompt_messages(messages, functions)

model_inputs = tokenizer.apply_chat_template(

    messages, return_tensors="pt").to(model.device)

generated_ids = model.generate(model_inputs, max_new_tokens=128)

decoded = tokenizer.batch_decode(generated_ids)

print(decoded[0])

```

## Training Approach

Empower's function models are fine-tuned based on state-of-the-art OSS models. We divided the training into two phases.

First, we perform SFT(supervised fine-tuning) using over 100k rows of hand-curated, high-quality conversations involving function calling. These conversations cover different scenarios such as single turn, multi-turn, and parallel calling. Specifically, the model is trained to use beginning tokens to determine whether it is calling functions or returning regular conversation (using  and  tags). It then returns function calls as JSON or conversations as usual, making streaming integration very straightforward. The SFT sets the model up with a very strong foundation covering various scenarios for general use cases.

Next, we apply DPO (Directly Preference Optimization) for trickier scenario where SFT (Supervised Fine-Tuning) is less effective. For instance, when function specifications include examples for arguments, we want to prevent the model from hallucinating argument values from these examples. We have found DPO to be very effective in correcting such misbehavior with a relatively small amount of data.

Finally, we are committed to continuously optimizing the model for better quality across a wider range of use cases and scenarios :) We can further fine-tune the model based on your specific needs. Please contact us if you have any use-case-specific requirements!

## Evaluation

We evaluate our models against the Berkeley Function Calling benchmark and both of the 8B and 70B version have achieved the state of the art performance on its size:

![image](assets/bfcl.png)

(captured on Sep 10, 2024)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/empower-ai/empower-functions

Awesome Lists containing this project

README