https://github.com/madroidmaq/mlx-omni-server

MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically designed for Apple Silicon (M-series) chips. It implements OpenAI-compatible API endpoints, enabling seamless integration with existing OpenAI SDK clients while leveraging the power of local ML inference.
https://github.com/madroidmaq/mlx-omni-server

function-calling genai mlx openai openai-api structured-output stt tools tts

Last synced: 22 days ago
JSON representation

Host: GitHub
URL: https://github.com/madroidmaq/mlx-omni-server
Owner: madroidmaq
License: mit
Created: 2024-11-05T11:52:00.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-09-02T14:50:19.000Z (about 1 month ago)
Last Synced: 2025-09-02T16:30:22.732Z (about 1 month ago)
Topics: function-calling, genai, mlx, openai, openai-api, structured-output, stt, tools, tts
Language: Python
Homepage: https://deepwiki.com/madroidmaq/mlx-omni-server/1-overview
Size: 5.01 MB
Stars: 540
Watchers: 9
Forks: 48
Open Issues: 12
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-ChatGPT-repositories - mlx-omni-server - MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically designed for Apple Silicon (M-series) chips. It implements OpenAI-compatible API endpoints, enabling seamless integration with existing OpenAI SDK clients while leveraging the power of local ML inference. (CLIs)

README

          


# MLX Omni Server

*Local AI inference server optimized for Apple Silicon*

[![PyPI version](https://img.shields.io/pypi/v/mlx-omni-server.svg)](https://pypi.python.org/pypi/mlx-omni-server)

[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://python.org)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/madroidmaq/mlx-omni-server)

![MLX Omni Server Banner](docs/banner.png)

**MLX Omni Server** provides dual API compatibility with both **OpenAI** and **Anthropic APIs**, enabling seamless local inference on Apple Silicon using the MLX framework.

[Installation](#-installation) • [Quick Start](#-quick-start) • [Documentation](#-documentation) • [Contributing](#-contributing)



## ✨ Features

- 🚀 **Apple Silicon Optimized** - Built on MLX framework for M1/M2/M3/M4 chips

- 🔌 **Dual API Support** - Compatible with both OpenAI and Anthropic APIs

- 🎯 **Complete AI Suite** - Chat, audio processing, image generation, embeddings

- ⚡ **High Performance** - Local inference with hardware acceleration

- 🔐 **Privacy-First** - All processing happens locally on your machine

- 🛠 **Drop-in Replacement** - Works with existing OpenAI and Anthropic SDKs

## 🚀 Installation

```bash

pip install mlx-omni-server

```

## ⚡ Quick Start

1. **Start the server:**

   ```bash

   mlx-omni-server

   ```

2. **Choose your preferred API:**

   

   OpenAI API (Click to expand)

   ```python

   from openai import OpenAI

   client = OpenAI(

       base_url="http://localhost:10240/v1",

       api_key="not-needed"

   )

   response = client.chat.completions.create(

       model="mlx-community/gemma-3-1b-it-4bit-DWQ",

       messages=[{"role": "user", "content": "Hello!"}]

   )

   print(response.choices[0].message.content)

   ```

   

   

   Anthropic API (Click to expand)

   ```python

   import anthropic

   client = anthropic.Anthropic(

       base_url="http://localhost:10240/anthropic",

       api_key="not-needed"

   )

   message = client.messages.create(

       model="mlx-community/gemma-3-1b-it-4bit-DWQ",

       max_tokens=1000,

       messages=[{"role": "user", "content": "Hello!"}]

   )

   print(message.content[0].text)

   ```

   

🎉 **That's it!** You're now running AI locally on your Mac.

## 📋 API Support

### OpenAI Compatible Endpoints (`/v1/*`)

| Endpoint | Feature | Status |

|----------|---------|--------|

| `/v1/chat/completions` | Chat with tools, streaming, structured output | ✅ |

| `/v1/audio/speech` | Text-to-Speech | ✅ |

| `/v1/audio/transcriptions` | Speech-to-Text | ✅ |

| `/v1/images/generations` | Image Generation | ✅ |

| `/v1/embeddings` | Text Embeddings | ✅ |

| `/v1/models` | Model Management | ✅ |

### Anthropic Compatible Endpoints (`/anthropic/v1/*`)

| Endpoint | Feature | Status |

|----------|---------|--------|

| `/anthropic/v1/messages` | Messages with tools, streaming, thinking mode | ✅ |

| `/anthropic/v1/models` | Model listing with pagination | ✅ |

## ⚙️ Configuration

```bash

# Default (port 10240)

mlx-omni-server

# Custom options

mlx-omni-server --port 8000

MLX_OMNI_LOG_LEVEL=debug mlx-omni-server

# View all options

mlx-omni-server --help

```

## 🛠 Development

Development Setup

```bash

git clone https://github.com/madroidmaq/mlx-omni-server.git

cd mlx-omni-server

uv sync

# Start with hot-reload

uv run uvicorn mlx_omni_server.main:app --reload --host 0.0.0.0 --port 10240

```

**Testing:**

```bash

uv run pytest                    # All tests

uv run pytest tests/chat/openai/ # OpenAI tests

uv run pytest tests/chat/anthropic/ # Anthropic tests

```

**Code Quality:**

```bash

uv run black . && uv run isort . # Format code

uv run pre-commit run --all-files # Run hooks

```

## 🎯 Key Features

**Model Management**

- Auto-discovery of MLX models in HuggingFace cache

- On-demand loading and intelligent caching

- Automatic model downloading when needed

**Advanced Capabilities**

- Function calling with model-specific parsers

- Real-time streaming for both APIs

- JSON schema validation and structured output

- Extended reasoning (thinking mode) for supported models

## 📚 Documentation

| Resource | Description |

|----------|-------------|

| [OpenAI API Guide](docs/openai-api.md) | Complete OpenAI API reference |

| [Anthropic API Guide](docs/anthropic-api.md) | Complete Anthropic API reference |

| [Examples](examples/) | Practical usage examples |

## 🔍 Troubleshooting

Common Issues

**Requirements:**

- Python 3.11+

- Apple Silicon Mac (M1/M2/M3/M4)

- MLX framework installed

**Quick fixes:**

```bash

# Check requirements

python --version  # Should be 3.11+

python -c "import mlx; print(mlx.__version__)"

# Pre-download models (if needed)

huggingface-cli download mlx-community/gemma-3-1b-it-4bit-DWQ

# Enable debug logging

MLX_OMNI_LOG_LEVEL=debug mlx-omni-server

```

## 🤝 Contributing

**Quick contributor setup:**

```bash

git clone https://github.com/madroidmaq/mlx-omni-server.git

cd mlx-omni-server

uv sync && uv run pytest

```



---

## 🙏 Acknowledgments

Built with [MLX](https://github.com/ml-explore/mlx) by Apple • [FastAPI](https://fastapi.tiangolo.com/) • [MLX-LM](https://github.com/ml-explore/mlx-lm)

## 📄 License

[MIT License](LICENSE) • Not affiliated with OpenAI, Anthropic, or Apple

## 🌟 Star History

[![Star History Chart](https://api.star-history.com/svg?repos=madroidmaq/mlx-omni-server&type=Date)](https://star-history.com/#madroidmaq/mlx-omni-server&Date)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/madroidmaq/mlx-omni-server

Awesome Lists containing this project

README