{"id":30745864,"url":"https://github.com/easychen/miniaiapi","last_synced_at":"2026-01-18T03:32:21.667Z","repository":{"id":311014683,"uuid":"1042108986","full_name":"easychen/miniaiapi","owner":"easychen","description":"OpenAI-compatible API optimized for M-series Macs | 为 M 系 Mac 打造的 OpenAI 兼容 API","archived":false,"fork":false,"pushed_at":"2025-08-24T05:01:37.000Z","size":2110,"stargazers_count":51,"open_issues_count":0,"forks_count":5,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-04T04:03:17.615Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/easychen.png","metadata":{"files":{"readme":"README.en.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-21T13:51:21.000Z","updated_at":"2025-09-03T15:28:47.000Z","dependencies_parsed_at":"2025-08-21T17:15:33.487Z","dependency_job_id":"afb97210-2d3d-4013-9760-85312bbe9a99","html_url":"https://github.com/easychen/miniaiapi","commit_stats":null,"previous_names":["easychen/miniaiapi"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/easychen/miniaiapi","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/easychen%2Fminiaiapi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/easychen%2Fminiaiapi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/easychen%2Fminiaiapi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/easychen%2Fminiaiapi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/easychen","download_url":"https://codeload.github.com/easychen/miniaiapi/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/easychen%2Fminiaiapi/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28528213,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-18T00:39:45.795Z","status":"online","status_checked_at":"2026-01-18T02:00:07.578Z","response_time":98,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-04T04:01:14.063Z","updated_at":"2026-01-18T03:32:19.014Z","avatar_url":"https://github.com/easychen.png","language":"JavaScript","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"readme":"# MiniAiApi\n\n[中文](README.md) | **English**\n\n\u003e OpenAI-compatible API optimized for M-series Macs\n\n![](res/logo.png)\n\n## Overview\n\nMiniAiApi is an AI capability service optimized for M-series chip Macs, especially Mac Mini. It attempts to leverage Mac ecosystem-supported software and frameworks (MLX) to provide the highest performance OpenAI-compatible API interfaces.\n\n## Features\n\n- 🎤 **Text-to-Speech (TTS)**: Speech synthesis using macOS native `say` command\n- 🎵 **Voice Cloning**: High-quality audio cloning technology based on MLX-Audio and SparkTTS\n- 🎧 **Speech-to-Text (STT)**: Speech recognition based on MLX Whisper\n- 🤖 **Chat Completion**: Proxy forwarding to LMstudio chat interface\n- 🔗 **Embeddings**: Proxy forwarding to LMstudio embedding model interface\n- 🎨 **Image Generation**: Integrated Draw Things Mac App for AI drawing\n- 🔌 **OpenAI Compatible**: Fully compatible with OpenAI API format\n- ⚡ **High Performance**: Optimized for M-series Macs\n- 🛡️ **Secure**: Supports API key authentication\n\n## API Support Status\n\n| API Endpoint | Status | Description | Dependencies |\n|-------------|--------|-------------|-------------|\n| `/v1/audio/speech` | ✅ Available | TTS speech synthesis (traditional + cloning) | macOS `say` / MLX-Audio |\n| `/v1/audio/transcriptions` | ✅ Available | Speech to text | MLX Whisper |\n| `/v1/audio/translations` | ✅ Available | Speech translation to English | MLX Whisper |\n| `/v1/chat/completions` | ✅ Available | Chat completion | LMstudio |\n| `/v1/embeddings` | ✅ Available | Text embeddings | LMstudio |\n| `/v1/images/generations` | ✅ Available | Image generation | Draw Things |\n| `/v1/models` | ✅ Available | Get model list | - |\n| `/health` | ✅ Available | Health check | - |\n\n\u003e **Note**: \n\u003e - ✅ indicates implemented and available APIs\n\u003e - Some APIs require additional dependency services to work properly\n\u003e - All APIs are compatible with OpenAI request and response formats\n\n## System Requirements\n\n- macOS (Recommended macOS 14+)\n- Node.js 18+\n- MLX Whisper (for speech recognition)\n- MLX-Audio (for audio cloning, optional)\n- FFmpeg (for audio format conversion)\n- LMstudio (for chat and embedding functions)\n- Draw Things Mac App (for image generation, optional)\n\n## Installation\n\n### 1. Clone the Project\n\n```bash\ngit clone \u003crepository-url\u003e\ncd miniAiApi\n```\n\n### 2. Install Dependencies\n\n```bash\nnpm install\n```\n\n### 3. Install System Dependencies\n\n```bash\n# Install MLX Whisper\npip install mlx-whisper\n\n# Install MLX-Audio (optional, for audio cloning)\npip install mlx-audio\n\n# Install FFmpeg\nbrew install ffmpeg\n\n# Install and configure LMstudio\n# 1. Download and install LMstudio from the official website\n# 2. Start LMstudio, enable API Server in settings\n# 3. Set listening address to 127.0.0.1:1234 (default)\n\n# Install and configure Draw Things (optional)\n# 1. Install Draw Things from App Store\n# 2. Enable HTTP API Server in Settings → Advanced Settings\n# 3. Set listening address to 127.0.0.1:7860 (default)\n```\n\n### 4. Pre-download Models (Optional)\n\nIt's recommended to pre-download models to avoid waiting on first use:\n\n```bash\n# Install Hugging Face CLI (if not already installed)\npip install huggingface_hub\n\n# Download whisper\nhf download mlx-community/whisper-large-v3-mlx (Large model has better Chinese recognition)\n\n# Download recommended Chinese TTS models\nhf download mlx-community/Spark-TTS-0.5B-fp16\nhf download mlx-community/Spark-TTS-0.5B-4-6bit (backup, worse cloning effect but faster)\n\n# Download other available models (optional)\nhf download mlx-community/Kokoro-82M-bf16 (poor Chinese support, good for other languages)\n```\n\n\u003e **Note**: Model downloads may take several minutes to tens of minutes, depending on network speed. Use the `hf download` command to see download progress.\n\n### 5. Configure Environment\n\n```bash\ncp env.example .env\n```\n\nEdit the `.env` file to configure your settings:\n\n```env\n# Server configuration\nPORT=3000\nHOST=0.0.0.0\n\n# TTS configuration\nTTS_VOICE=Yue\nTTS_OUTPUT_FORMAT=mp3\n\n# TTS audio cloning configuration (optional)\nTTS_CLONE_ENABLED=false\nTTS_CLONE_MODEL=mlx-community/Spark-TTS-0.5B-fp16\nTTS_CLONE_REF_AUDIO=/path/to/reference/audio.mp3\nTTS_CLONE_REF_TEXT=Reference text content corresponding to the audio\nTTS_CLONE_LANG_CODE=z\nTTS_CLONE_SPEED=1.0\n\n# STT configuration\nSTT_MODEL=mlx-community/whisper-large-v3-mlx\nSTT_LANGUAGE=zh\n\n# API security\nAPI_KEY_REQUIRED=false\nAPI_KEY=your-api-key-here\n\n# LMstudio configuration\nLMSTUDIO_BASE_URL=http://127.0.0.1:1234\nLMSTUDIO_API_KEY=\nLMSTUDIO_TIMEOUT=60000\n\n# Draw Things configuration\nDRAW_THINGS_BASE_URL=http://127.0.0.1:7860\nDRAW_THINGS_ENABLED=false\nDRAW_THINGS_TIMEOUT=120000\n```\n\n## Audio Cloning Configuration Example\n\n### Complete Configuration Steps\n\n1. **Enable audio cloning feature**\n   ```env\n   TTS_CLONE_ENABLED=true\n   ```\n\n2. **Select and download model**\n   ```bash\n   # Recommended: High-quality Chinese model\n   hf download mlx-community/Spark-TTS-0.5B-fp16\n   \n   # Or: Quantized version (uses less memory)\n   hf download mlx-community/Spark-TTS-0.5B-4-6bit\n   ```\n\n3. **Configure model and reference audio**\n   ```env\n   TTS_CLONE_MODEL=mlx-community/Spark-TTS-0.5B-fp16\n   TTS_CLONE_REF_AUDIO=/Users/yourname/audio/reference.mp3\n   TTS_CLONE_REF_TEXT=This is the complete content spoken by the speaker in the reference audio, which must match the audio content exactly.\n   TTS_CLONE_LANG_CODE=z\n   TTS_CLONE_SPEED=1.0\n   ```\n\n4. **Prepare reference audio**\n   - High audio quality with minimal background noise\n   - Recommended duration: 10-30 seconds\n   - Supported formats: MP3, WAV, M4A, etc.\n   - Reference text must match audio content exactly\n\n### Test Configuration\n\nAfter configuration, you can test with the following commands:\n\n```bash\n# Test traditional TTS\ncurl -X POST http://localhost:3000/v1/audio/speech \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\": \"tts-1\", \"input\": \"Test traditional speech synthesis\"}' \\\n  --output test_normal.mp3\n\n# Test audio cloning\ncurl -X POST http://localhost:3000/v1/audio/speech \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\": \"tts-1:clone\", \"input\": \"Test audio cloning functionality\"}' \\\n  --output test_clone.mp3\n```\n\n## Usage\n\n### Start Service\n\n```bash\n# Production environment\nnpm start\n\n# Development environment (auto-restart)\nnpm run dev\n```\n\nAfter the service starts, visit http://localhost:3000 to view API information.\n\n### API Interfaces\n\n#### 1. Text-to-Speech (TTS)\n\n##### Traditional TTS (using macOS system voices)\n\n```bash\ncurl -X POST http://localhost:3000/v1/audio/speech \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"tts-1\",\n    \"input\": \"Hello, this is a test.\",\n    \"voice\": \"alloy\",\n    \"response_format\": \"mp3\"\n  }' \\\n  --output speech.mp3\n```\n\n**Supported Voices**:\n- `alloy` → Yue (Chinese)\n- `echo` → Ting-Ting (Chinese)\n- `fable` → Sin-ji (Chinese)\n- `onyx` → Li-mu (Chinese)\n- `nova` → Mei-Jia (Chinese)\n- `shimmer` → Yu-shu (Chinese)\n\n##### Audio Cloning TTS (using MLX-Audio)\n\nTo use audio cloning functionality, simply add the `:clone` suffix to the model name:\n\n```bash\ncurl -X POST http://localhost:3000/v1/audio/speech \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"tts-1:clone\",\n    \"input\": \"Actually, model hallucination is not a big problem. Rather, believing in the knowledge from probability model pre-training is like climbing a tree to catch fish. The core of the model should be strong reasoning ability, then import trusted context, and derive answers through reasoning ability.\",\n    \"voice\": \"alloy\",\n    \"response_format\": \"mp3\",\n    \"speed\": 1.5\n  }' \\\n  --output cloned_speech.mp3\n```\n\n\u003e **Note**: \n\u003e - Before using clone mode, you need to configure `TTS_CLONE_ENABLED=true` and related parameters in `.env`\n\u003e - You need to provide a reference audio file and corresponding reference text\n\u003e - Clone mode ignores the `voice` parameter and uses the configured reference audio for voice cloning\n\n#### 2. Speech-to-Text (STT)\n\n```bash\ncurl -X POST http://localhost:3000/v1/audio/transcriptions \\\n  -H \"Content-Type: multipart/form-data\" \\\n  -F file=\"@audio.mp3\" \\\n  -F model=\"whisper-1\" \\\n  -F language=\"zh\"\n```\n\n#### 3. Speech Translation\n\n```bash\ncurl -X POST http://localhost:3000/v1/audio/translations \\\n  -H \"Content-Type: multipart/form-data\" \\\n  -F file=\"@audio.mp3\" \\\n  -F model=\"whisper-1\"\n```\n\n#### 4. Get Model Information\n\n```bash\ncurl http://localhost:3000/v1/models\n```\n\n#### 5. Chat Completion\n\n```bash\ncurl -X POST http://localhost:3000/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -H \"Authorization: Bearer your-api-key\" \\\n  -d '{\n    \"model\": \"gpt-3.5-turbo\",\n    \"messages\": [\n      {\"role\": \"user\", \"content\": \"Hello\"}\n    ]\n  }'\n```\n\n#### 6. Text Embeddings\n\n```bash\ncurl -X POST http://localhost:3000/v1/embeddings \\\n  -H \"Content-Type: application/json\" \\\n  -H \"Authorization: Bearer your-api-key\" \\\n  -d '{\n    \"model\": \"text-embedding-ada-002\",\n    \"input\": \"This is a text that needs to be vectorized\"\n  }'\n```\n\n#### 7. Image Generation\n\n```bash\ncurl -X POST http://localhost:3000/v1/images/generations \\\n  -H \"Content-Type: application/json\" \\\n  -H \"Authorization: Bearer your-api-key\" \\\n  -d '{\n    \"prompt\": \"A cute kitten playing in the garden\",\n    \"n\": 1,\n    \"size\": \"1024x1024\",\n    \"quality\": \"standard\",\n    \"style\": \"vivid\"\n  }'\n```\n\n#### 8. Health Check\n\n```bash\ncurl http://localhost:3000/health\n```\n\n## Configuration Options\n\n### TTS Configuration\n\n- `TTS_VOICE`: Default voice (default: Yue)\n- `TTS_OUTPUT_FORMAT`: Output format (mp3/wav)\n- `TTS_TEMP_DIR`: Temporary file directory\n\n### TTS Audio Cloning Configuration\n\n- `TTS_CLONE_ENABLED`: Whether to enable audio cloning functionality (default: false)\n- `TTS_CLONE_MODEL`: MLX-Audio model name (default: mlx-community/Spark-TTS-0.5B-fp16)\n- `TTS_CLONE_REF_AUDIO`: Reference audio file path (required, used for voice cloning)\n- `TTS_CLONE_REF_TEXT`: Text content corresponding to reference audio (required, used for model alignment)\n- `TTS_CLONE_LANG_CODE`: Language code (default: z, for Chinese)\n- `TTS_CLONE_SPEED`: Speech speed (default: 1.0, range 0.5-2.0)\n\n**Recommended MLX-Audio Models**:\n- `mlx-community/Spark-TTS-0.5B-fp16` - High-quality Chinese TTS model (recommended)\n- `mlx-community/Spark-TTS-0.5B-4-6bit` - Quantized version, uses less memory\n- `mlx-community/Kokoro-82M-bf16` - Lightweight model with multilingual support\n\n**Language Code Description**:\n- `z` - Chinese (recommended for Chinese text)\n- `a` - American English\n- `b` - British English\n- `j` - Japanese\n\n### STT Configuration\n\n- `STT_MODEL`: Whisper model (default: mlx-community/whisper-large-v3-mlx)\n- `STT_LANGUAGE`: Recognition language (zh/en/auto etc.)\n- `STT_OUTPUT_DIR`: Output directory\n\n### LMstudio Configuration\n\n- `LMSTUDIO_BASE_URL`: LMstudio service address (default: http://127.0.0.1:1234)\n- `LMSTUDIO_API_KEY`: LMstudio API key (optional)\n- `LMSTUDIO_TIMEOUT`: Request timeout (default: 60000ms)\n\n### Draw Things Configuration\n\n- `DRAW_THINGS_BASE_URL`: Draw Things HTTP API address (default: http://127.0.0.1:7860)\n- `DRAW_THINGS_ENABLED`: Whether to enable image generation functionality (default: false)\n- `DRAW_THINGS_TIMEOUT`: Request timeout (default: 120000ms)\n\n### Available Whisper Models\n\n- `mlx-community/whisper-tiny`\n- `mlx-community/whisper-base`\n- `mlx-community/whisper-small`\n- `mlx-community/whisper-medium`\n- `mlx-community/whisper-large-v2`\n- `mlx-community/whisper-large-v3`\n- `mlx-community/whisper-large-v3-mlx`\n- `mlx-community/whisper-large-v3-turbo`\n\n## Development\n\n### Project Structure\n\n```\nminiAiApi/\n├── src/\n│   ├── index.js              # Main server file\n│   ├── services/\n│   │   ├── ttsService.js     # TTS service\n│   │   └── sttService.js     # STT service\n│   ├── routes/\n│   │   ├── audioRoutes.js    # Audio API routes\n│   │   └── imageRoutes.js    # Image API routes\n│   └── middleware/\n│       └── auth.js           # Authentication middleware\n├── config/\n│   └── default.js            # Configuration management\n├── public/\n├── env.example\n├── package.json\n└── README.md\n```\n\n### Adding New Features\n\n1. Add new service classes in `src/services/`\n2. Add corresponding routes in `src/routes/`\n3. Register routes in `src/index.js`\n4. Update configuration files and documentation\n\n## Error Handling\n\nThe API uses standard HTTP status codes and OpenAI-compatible error format:\n\n```json\n{\n  \"error\": {\n    \"message\": \"Error description\",\n    \"type\": \"error_type\",\n    \"code\": \"error_code\"\n  }\n}\n```\n\n## Performance Optimization\n\n- Automatic temporary file cleanup (hourly)\n- Support for concurrent request processing\n- MLX optimization for Mac Mini M4\n- Automatic audio format conversion\n\n## Security Considerations\n\n- Production environment recommends enabling API key authentication\n- Limit file upload size (default 50MB)\n- Regular cleanup of temporary files\n- Use HTTPS (requires SSL certificate configuration)\n\n## Troubleshooting\n\n### Common Issues\n\n1. **TTS not working**\n   - Check if macOS voices are available: `say -v ?`\n   - Ensure FFmpeg is installed\n\n2. **Audio cloning not working**\n   - Check if MLX-Audio is installed: `python -m mlx_audio.tts.generate --help`\n   - Ensure `TTS_CLONE_ENABLED=true` is configured\n   - Check if reference audio file exists and is readable\n   - Ensure reference text matches reference audio content\n   - Verify model is downloaded: `ls ~/.cache/huggingface/hub/`\n   - Check if language code is correct (use `z` for Chinese)\n\n3. **STT not working**\n   - Check if MLX Whisper is installed: `which mlx_whisper`\n   - Ensure model is downloaded\n\n4. **File upload failed**\n   - Check if file size exceeds limit\n   - Ensure audio format is supported\n\n5. **LMstudio proxy failed**\n   - Check if LMstudio is started and API Server is enabled\n   - Confirm address and port configuration is correct\n   - Check if API Key matches\n\n6. **Image generation failed**\n   - Confirm Draw Things is installed and HTTP API Server is enabled\n   - Check `DRAW_THINGS_ENABLED=true` configuration\n   - Confirm Draw Things is listening on the correct port (7860)\n\n### Log Viewing\n\nThe service outputs detailed log information, including:\n- Request processing time\n- Error stack traces\n- Service status checks\n\n## License\n\nMIT License\n\n## Contributing\n\nIssues and Pull Requests are welcome!\n\n## Changelog\n\n### v1.1.0\n- ✨ Added audio cloning functionality based on MLX-Audio\n- 🎵 Support for voice cloning using reference audio\n- 🔧 Added flexible cloning configuration options\n- 📚 Improved documentation and troubleshooting guide\n\n### v1.0.0\n- Initial release\n- Support for TTS, STT, and translation features\n- OpenAI API compatibility\n- Mac Mini M4 optimization\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feasychen%2Fminiaiapi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feasychen%2Fminiaiapi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feasychen%2Fminiaiapi/lists"}