Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
https://github.com/ai00-x/ai00_server

A localized open-source AI server that is better than ChatGPT.
https://github.com/ai00-x/ai00_server
ai chatgpt chatgpt-api chatgpt4 chatgpt4free llm openai openai-api rwkv
Last synced: 7 days ago
JSON representation
A localized open-source AI server that is better than ChatGPT.
Host: GitHub
URL: https://github.com/ai00-x/ai00_server
Owner: Ai00-X
License: mit
Created: 2023-07-10T15:43:18.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-05-22T17:53:32.000Z (about 2 months ago)
Last Synced: 2024-05-22T20:30:53.958Z (about 2 months ago)
Topics: ai, chatgpt, chatgpt-api, chatgpt4, chatgpt4free, llm, openai, openai-api, rwkv
Language: Rust
Homepage: https://ai00-x.github.io/ai00_server/
Size: 16.7 MB
Stars: 400
Watchers: 15
Forks: 49
Open Issues: 17
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists

awesome-ChatGPT-repositories - ai00_server - A localized open-source AI server that is better than ChatGPT. (NLP)
README

        # 💯AI00 RWKV Server



    



 

    

![license](https://shields.io/badge/license-MIT%2FApache--2.0-blue)

[![Rust Version](https://img.shields.io/badge/Rust-1.78.0+-blue)](https://releases.rs/docs/1.75.0)

![PRs welcome](https://img.shields.io/badge/PRs-Welcome-brightgreen)     

[![All Contributors](https://img.shields.io/badge/all_contributors-7-orange.svg?style=flat-square)](#contributors-)

[![en](https://img.shields.io/badge/lang-en-red.svg)](README.md)

[![zh](https://img.shields.io/badge/lang-zh-blue.svg)](README.zh.md)

---



    

`AI00 RWKV Server` is an inference API server for the [`RWKV` language model](https://github.com/BlinkDL/ChatRWKV) based upon the [`web-rwkv`](https://github.com/cryscan/web-rwkv) inference engine.

It supports `Vulkan` parallel and concurrent batched inference and can run on all GPUs that support `Vulkan`. No need for Nvidia cards!!! AMD cards and even integrated graphics can be accelerated!!!

No need for bulky `pytorch`, `CUDA` and other runtime environments, it's compact and ready to use out of the box!

Compatible with OpenAI's ChatGPT API interface.

100% open source and commercially usable, under the MIT license.

If you are looking for a fast, efficient, and easy-to-use LLM API server, then `AI00 RWKV Server` is your best choice. It can be used for various tasks, including chatbots, text generation, translation, and Q&A.

Join the `AI00 RWKV Server` community now and experience the charm of AI!

QQ Group for communication: 30920262

### 💥Features

*   Based on the `RWKV` model, it has high performance and accuracy

*   Supports `Vulkan` inference acceleration, you can enjoy GPU acceleration without the need for `CUDA`! Supports AMD cards, integrated graphics, and all GPUs that support `Vulkan`

*   No need for bulky `pytorch`, `CUDA` and other runtime environments, it's compact and ready to use out of the box!

*   Compatible with OpenAI's ChatGPT API interface

### ⭕Usages

*   Chatbot

*   Text generation

*   Translation

*   Q&A

*   Any other tasks that LLM can do

### 👻Other

*   Based on the [web-rwkv](https://github.com/cryscan/web-rwkv) project

*   Model download: [V5](https://huggingface.co/cgisky/AI00_RWKV_V5) or [V6](https://huggingface.co/cgisky/ai00_rwkv_x060)

## Installation, Compilation, and Usage

### 📦Download Pre-built Executables

1.  Directly download the latest version from [Release](https://github.com/cgisky1980/ai00_rwkv_server/releases)

    

2.  After [downloading the model](#👻other), place the model in the `assets/models/` path, for example, `assets/models/RWKV-x060-World-3B-v2-20240228-ctx4096.st`

3.  Optionally modify [`assets/Config.toml`](./assets/Config.toml) for model configurations like model path, quantization layers, etc.

    

4.  Run in the command line

    

    ```bash

    $ ./ai00_rwkv_server

    ```

    

5.  Open the browser and visit the WebUI at http://localhost:65530 (https://localhost:65530 if `tls` is enabled)

    

### 📜(Optional) Build from Source

1.  [Install Rust](https://www.rust-lang.org/)

    

2.  Clone this repository

    

    ```bash

    $ git clone https://github.com/cgisky1980/ai00_rwkv_server.git

    $ cd ai00_rwkv_server

    ```

    

3.  After [downloading the model](#👻other), place the model in the `assets/models/` path, for example, `assets/models/RWKV-x060-World-3B-v2-20240228-ctx4096.st`

    

4.  Compile

    

    ```bash

    $ cargo build --release

    ```

    

5.  After compilation, run

    

    ```bash

    $ cargo run --release

    ```

    

6.  Open the browser and visit the WebUI at http://localhost:65530 (https://localhost:65530 if `tls` is enabled)

### 📒Convert the Model

It only supports Safetensors models with the `.st` extension now. Models saved with the `.pth` extension using torch need to be converted before use.

1. [Download the `.pth` model](https://huggingface.co/BlinkDL)

2. (Recommended) Run the python script `convert2ai00.py` or `convert_safetensors.py`:

    ```bash

    $ python ./convert2ai00.py --input /path/to/model.pth --output /path/to/model.st

    ```

    Requirements: Python, with `torch` and `safetensors` installed.

3. If you do not want to install python, In the [Release](https://github.com/cgisky1980/ai00_rwkv_server/releases) you could find an executable called `converter`. Run

  

  ```bash

  $ ./converter --input /path/to/model.pth --output /path/to/model.st

  ```

  

3. If you are building from source, run

  

  ```bash

  $ cargo run --release --package converter -- --input /path/to/model.pth --output /path/to/model.st

  ```

4. Just like the steps mentioned above, place the model in the `.st` model in the `assets/models/` path and modify the model path in [`assets/Config.toml`](./assets/Config.toml)

    

## 📝Supported Arguments

*   `--config`: Configure file path (default: `assets/Config.toml`)

*   `--ip`: The IP address the server is bound to

*   `--port`: Running port

## 📙Currently Available APIs

The API service starts at port 65530, and the data input and output format follow the Openai API specification.

Note that some APIs like `chat` and `completions` have additional optional fields for advanced functionalities. Visit http://localhost:65530/api-docs for API schema.

*   `/api/oai/v1/models`

*   `/api/oai/models`

*   `/api/oai/v1/chat/completions`

*   `/api/oai/chat/completions`

*   `/api/oai/v1/completions`

*   `/api/oai/completions`

*   `/api/oai/v1/embeddings`

*   `/api/oai/embeddings`

The following is an out-of-box example of Ai00 API invocations in Python:

```python

import openai

class Ai00:

    def __init__(self,model="model",port=65530,api_key="JUSTSECRET_KEY") :

        openai.api_base = f"http://127.0.0.1:{port}/api/oai"

        openai.api_key = api_key

        self.ctx = []

        self.params = {

            "system_name": "System",

            "user_name": "User", 

            "assistant_name": "Assistant",

            "model": model,

            "max_tokens": 4096,

            "top_p": 0.6,

            "temperature": 1,

            "presence_penalty": 0.3,

            "frequency_penalty": 0.3,

            "half_life": 400,

            "stop": ['\x00','\n\n']

        }

        

    def set_params(self,**kwargs):

        self.params.update(kwargs)

        

    def clear_ctx(self):

        self.ctx = []

        

    def get_ctx(self):

        return self.ctx

    

    def continuation(self, message):

        response = openai.Completion.create(

            model=self.params['model'],

            prompt=message,

            max_tokens=self.params['max_tokens'],

            half_life=self.params['half_life'],

            top_p=self.params['top_p'],

            temperature=self.params['temperature'],

            presence_penalty=self.params['presence_penalty'],

            frequency_penalty=self.params['frequency_penalty'],

            stop=self.params['stop']

        )

        result = response.choices[0].text

        return result

    

    def append_ctx(self,role,content):

        self.ctx.append({

            "role": role,

            "content": content

        })

        

    def send_message(self, message,role="user"):

        self.ctx.append({

            "role": role,

            "content": message

        })

        result = openai.ChatCompletion.create(

            model=self.params['model'],

            messages=self.ctx,

            names={

                "system": self.params['system_name'],

                "user": self.params['user_name'],

                "assistant": self.params['assistant_name']

            },

            max_tokens=self.params['max_tokens'],

            half_life=self.params['half_life'],

            top_p=self.params['top_p'],

            temperature=self.params['temperature'],

            presence_penalty=self.params['presence_penalty'],

            frequency_penalty=self.params['frequency_penalty'],

            stop=self.params['stop']

        )

        result = result.choices[0].message['content']

        self.ctx.append({

            "role": "assistant",

            "content": result

        })

        return result

    

ai00 = Ai00()

ai00.set_params(

    max_tokens = 4096,

    top_p = 0.55,

    temperature = 2,

    presence_penalty = 0.3,

    frequency_penalty = 0.8,

    half_life = 400,

    stop = ['\x00','\n\n']

)

print(ai00.send_message("how are you?"))

print(ai00.send_message("me too!"))

print(ai00.get_ctx())

ai00.clear_ctx()

print(ai00.continuation("i like"))

```

## BNF Sampling

Since v0.5, Ai00 has a unique feature called BNF sampling. BNF forces the model to output in specified formats (e.g., JSON or markdown with specified fields) by limiting the possible next tokens the model can choose from.

Here is an example BNF for JSON with fields "name", "age" and "job":

```

 ::= 

 ::= "{"  "}"

 ::=  |  ", " 

 ::=  ": " 

 ::= '"' "name" '"' | '"' "age" '"' | '"' "job" '"'

 ::=  | 

::='"''"'

::=||'\\"'|'\\"'

::='\t'|'\n'|'\r'|'"'

 ::= |'0'

::=|

::='0'|

::="1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"

```

## 📙WebUI Screenshots

### Chat

### Continuation

### Paper (Parallel Inference Demo)

## 📝TODO List

*   [x] Support for `text_completions` and `chat_completions`

*   [x] Support for sse push

*   [x] Integrate basic front-end

*   [x] Parallel inference via `batch serve`

*   [x] Support for `int8` quantization

*   [x] Support for `NF4` quantization

*   [x] Support for `LoRA` model

*   [x] Support for tuned initial states

*   [ ] Hot loading and switching of `LoRA` model

*   [x] Hot loading and switching of tuned initial states

*   [x] BNF sampling

## 👥Join Us

We are always looking for people interested in helping us improve the project. If you are interested in any of the following, please join us!

*   💀Writing code

*   💬Providing feedback

*   🔆Proposing ideas or needs

*   🔍Testing new features

*   ✏Translating documentation

*   📣Promoting the project

*   🏅Anything else that would be helpful to us

No matter your skill level, we welcome you to join us. You can join us in the following ways:

*   Join our Discord channel

*   Join our QQ group

*   Submit issues or pull requests on GitHub

*   Leave feedback on our website

We can't wait to work with you to make this project better! We hope the project is helpful to you!

## Acknowledgement

Thank you to these awesome individuals who are insightful and outstanding for their support and selfless dedication to the project!

  

    

      
_顾真牛
📖 💻 🖋 🎨 🧑‍🏫

      
_研究社交
💻 💡 🤔 🚧 👀 📦

      
_josc146
🐛 💻 🤔 🔧

      
_l15y
🔧 🔌 💻

      
_{Cahya Wirawan}
🐛

      
_{yuunnn_w}
📖 ⚠️

      
_longzou
💻 🛡️

    

  

## Stargazers over time

[![Stargazers over time](https://starchart.cc/cgisky1980/ai00_rwkv_server.svg)](https://starchart.cc/cgisky1980/ai00_rwkv_server)