Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mistralai/mistral-common
https://github.com/mistralai/mistral-common
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/mistralai/mistral-common
- Owner: mistralai
- License: apache-2.0
- Created: 2024-04-15T08:43:59.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-04-21T12:16:22.000Z (2 months ago)
- Last Synced: 2024-04-21T14:40:47.144Z (2 months ago)
- Language: Python
- Size: 516 KB
- Stars: 280
- Watchers: 15
- Forks: 14
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Lists
- awesome-stars - mistralai/mistral-common - (Python)
README
# Mistral Common
## What is it?
mistral-common is a set of tools to help you work with Mistral models.Our first release contains tokenization. Our tokenizers go beyond the usual text <-> tokens, adding parsing of tools and structured conversation. We also release the validation and normalization code that is used in our API.
We are releasing three versions of our tokenizer powering different sets of models.
- v1: open-mistral-7b, open-mixtral-8x7b, mistral-embed
- v2: mistral-small-latest, mistral-large-latest
- v3: open-mixtral-8x22b## Installation
### pip
You can install `mistral-common` via pip:
```
pip install mistral-common
```### From Source
Alternatively, you can install from source directly. This repo uses poetry as a dependency and virtual environment manager.You can install poetry with
```
pip install poetry
```poetry will set up a virtual environment and install dependencies with the following command:
```
poetry install
``````py
# Import needed packages:
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.protocol.instruct.tool_calls import Function, Tool
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer# Load Mistral tokenizer
model_name = "open-mixtral-8x22b"
tokenizer = MistralTokenizer.from_model(model_name)
# Tokenize a list of messages
tokenized = tokenizer.encode_chat_completion(
ChatCompletionRequest(
tools=[
Tool(
function=Function(
name="get_current_weather",
description="Get the current weather",
parameters={
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
)
)
],
messages=[
UserMessage(content="What's the weather like today in Paris"),
],
model=model_name,
)
)
tokens, text = tokenized.tokens, tokenized.text# Count the number of tokens
print(len(tokens))
```