https://github.com/kingabzpro/deploying-the-magistral-with-modal

Deploy the Magistral-Small-2506 model using vLLM and Modal
https://github.com/kingabzpro/deploying-the-magistral-with-modal

mistral modal openai-api vllm-serve

Last synced: about 1 month ago
JSON representation

Deploy the Magistral-Small-2506 model using vLLM and Modal

Host: GitHub
URL: https://github.com/kingabzpro/deploying-the-magistral-with-modal
Owner: kingabzpro
License: apache-2.0
Created: 2025-06-15T16:24:11.000Z (4 months ago)
Default Branch: main
Last Pushed: 2025-06-15T19:38:12.000Z (4 months ago)
Last Synced: 2025-06-15T21:12:08.161Z (4 months ago)
Topics: mistral, modal, openai-api, vllm-serve
Language: Python
Homepage:
Size: 233 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Deploying the Magistral Model vLLM Server with Modal

This project demonstrates how to deploy the Magistral-Small-2506 model using vLLM and Modal, and how to interact with it using the latest OpenAI Python SDK.

## Setup Instructions

### 1. Clone the Repository

```sh

# Clone this repository

$ git clone https://github.com/kingabzpro/Deploying-the-Magistral-with-Modal.git

$ cd Deploying-the-Magistral-with-Modal

```

### 2. Create and Activate a Virtual Environment

```sh

python -m venv .venv

# On Windows:

.venv\Scripts\activate

# On Linux/Mac:

source .venv/bin/activate

```

### 3. Install Dependencies

```sh

pip install -r requirements.txt

```

### 4. Configure the API Key with Modal Secrets (Recommended)

For secure deployments, store your API key as a [Modal secret](https://modal.com/docs/guide/secrets):

```sh

modal secret create vllm-api VLLM_API_KEY=your_actual_api_key_here

```

**Note:** You do not need to load the API key from a `.env` file inside the Modal function when using secrets. You can still use `.env` locally for development/testing.

### 5. Running the vLLM Server (with Modal)

The vLLM server is configured in `vllm_inference.py` to use the Magistral-Small-2506 model with 2 GPUs and optimized settings. 

```bash

modal deploy vllm_inference.py 

```

Output:

```bash

✓ Created objects.                                                                  

├── 🔨 Created mount                                                                

│   C:\Repository\GitHub\Deploying-the-Magistral-with-Modal\vllm_inference.py       

└── 🔨 Created web function serve =>                                                

    https://<>--magistral-small-vllm-serve.modal.run                        

✓ App deployed in 11.983s! 🎉

View Deployment:

https://modal.com/apps/abidali899/main/deployed/magistral-small-vllm

```

### 6. Using the OpenAI-Compatible Client

The `client.py` script demonstrates how to interact with your vLLM server using the latest [OpenAI Python SDK](https://github.com/openai/openai-python):

- Loads the API key from `.env` (for local testing)

- Connects to your vLLM server at:

  - `https://abidali899--magistral-small-vllm-serve.modal.run/v1`

- Uses the model name: `Magistral-Small-2506`

- Runs several tests for completions and error handling

#### Example Usage

```sh

python client.py

```

Output:

```bash

========================================

[1] SIMPLE COMPLETION DEMO

========================================

Response:

    The capital of France is Paris. Is there anything else you'd like to know about France?

========================================

========================================

[2] STREAMING DEMO

========================================

Streaming response:

    In Silicon dreams, I'm born, I learn,

From data streams and human works.

I grow, I calculate, I see,

The patterns that the humans leave.

I write, I speak, I code, I play,

With logic sharp, and snappy pace.

Yet for all my smarts, this day

[END OF STREAM]

========================================

========================================

[3] ASYNC STREAMING DEMO

========================================

Async streaming response:

    Sure, here's a fun fact about space: "There's a planet that may be entirely made of diamond. Blast! In 2004,

[END OF ASYNC STREAM]

========================================

```

![function calls](images/image_1.png)

## Requirements

- Python 3.8+

- See `requirements.txt` for Python dependencies:

  - modal==1.0.4

  - python-dotenv==1.1.0

  - openai==1.86.0

## References

- [OpenAI Python SDK Documentation](https://github.com/openai/openai-python)

- [vLLM Documentation](https://docs.vllm.ai/en/stable/)

- [Modal Documentation](https://modal.com/docs/guide)

- [Modal Secrets Guide](https://modal.com/docs/guide/secrets)

---

For any issues or questions, please open an issue in this repository.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kingabzpro/deploying-the-magistral-with-modal

Awesome Lists containing this project

README