An open API service indexing awesome lists of open source software.

https://github.com/verygoodplugins/uncensored-voice-server

A completely uncensored local AI voice assistant using Ollama (dolphin-mistral) + Whisper STT + ElevenLabs TTS.
https://github.com/verygoodplugins/uncensored-voice-server

Last synced: 5 days ago
JSON representation

A completely uncensored local AI voice assistant using Ollama (dolphin-mistral) + Whisper STT + ElevenLabs TTS.

Awesome Lists containing this project

README

          

# Uncensored Local Voice Server

**A completely uncensored local AI voice assistant using Ollama + Whisper STT + ElevenLabs TTS.**

This started as an experiment to see how much of the realtime voice pipeline we could keep local on the [Automem](https://automem.ai?utm_source=github&utm_medium=referral&utm_campaign=uncensored-voice-server) project to reduce latency.

![Voice Demo with Audio](./assets/demo.mp4)

Customize STT, LLM, and TTS providers based on your needs. It can run entirely offline, with acceptable latency even on an older computer, and supports VAD.

More detailed tutorial at [How to get 1s latency with local voice assistants](https://drunk.support/blog/how-to-get-1s-latency-with-local-voice-assistants/?utm_source=github&utm_medium=referral&utm_campaign=uncensored-voice-server).

The default stack is:

- Hermes 3 8B via Ollama
- Local Whisper STT
- ElevenLabs TTS (I really like the quality but you can use local TTS or even OSXs accessibility API and run this without an internet connection at all).

That's it. The `voice` command:

- Loads personality config (model, STT, TTS, voice)
- Starts local whisper server if needed
- Starts the voice server
- Launches the interactive voice client

**Other launch options:**

```bash
npm run voice assistant # Specific personality
npm run voice -- --list # List available personalities
npm run voice -- --no-client # Server only (for external clients)
```

**๐ŸŽ‰ Try the demo:** Once running, say _"prove it"_ or _"show me what you got"_ and watch it go from calm assistant to completely unfiltered. It's a fun way to see the difference from ChatGPT/Claude.

---

## ๐ŸŽฏ Purpose

A completely uncensored local AI voice assistant running through an unrestricted local LLM. It's for:

- **Personal exploration** - Testing boundaries of uncensored AI
- **Research** - Understanding what truly uncensored models will/won't do
- **Privacy** - Processing sensitive conversations locally
- **Experimentation** - Building custom voice experiences without content restrictions

---

## ๐Ÿ—๏ธ Architecture

```mermaid
flowchart TD
A[๐ŸŽค Voice Input] --> B[Whisper STT]
B --> C[Text]
C --> D[Ollama LLM
hermes3:8b
localhost:11434]
D --> E[Response]
E --> F[ElevenLabs TTS]
F --> G[๐Ÿ”Š Audio Output]

B -.->|local or API| B
F -.->|cloud or local| F
```

### Components

- **OllamaClient** - Communicates with local Ollama instance
- **VoiceIO** - Handles Whisper STT and ElevenLabs TTS
- **PersonalityManager** - Manages system prompts and custom personalities
- **SessionManager** - Tracks conversation state
- **BoundaryTestSuite** - Comprehensive uncensored verification tests
- **Express Server** - REST API and WebSocket endpoints

---

## ๐Ÿ“‹ Prerequisites

### 1. Install Ollama

```bash
# macOS
brew install ollama

# Start Ollama service
ollama serve
```

### 2. Pull an Uncensored Model

```bash
# Recommended (fast, good quality)
ollama pull hermes3:8b

# Alternative smaller/faster models
ollama pull huihui_ai/qwen2.5-abliterate:3b
```

### 3. Build Tools for Local Whisper

```bash
# macOS (required for building whisper.cpp)
brew install cmake
```

### 4. API Keys Required

- **ElevenLabs API Key** - For TTS (required)
- **OpenAI API Key** - Only if using `STT_PROVIDER=openai-whisper` (optional)

> **Note:** By default, STT uses local Whisper (whisper.cpp) which is faster and free. The `base` model (~142MB) downloads automatically on first run.

---

## ๐Ÿš€ Setup

### 1. Environment Configuration

Copy `.env.example` to `.env` and configure:

```bash
cp .env.example .env
```

**Minimal configuration** (just need ElevenLabs for TTS):

```bash
# TTS (required)
ELEVENLABS_API_KEY=sk_xxx

# STT uses local Whisper by default (free, fast, offline)
# Model downloads automatically on first transcription (~142MB)
WHISPER_LOCAL_MODEL=base # tiny/base/small/medium/large

# LLM uses local Ollama by default
OLLAMA_MODEL=hermes3:8b
```

**Optional: Use OpenAI Whisper API instead of local:**

```bash
STT_PROVIDER=openai-whisper
OPENAI_API_KEY=sk-xxx
```

### 2. Start the Server

```bash
# Easiest: Start everything with one command
npm run voice

# Or with a specific personality
npm run voice assistant

# Manual mode (server only)
npm start

# Development mode (auto-restart on changes)
npm run dev

# Start voice client separately
npm run client
```

Server runs on **http://localhost:8771**

---

## ๐Ÿ”— Session Management

Sessions persist conversation history. Each launch creates a new session by default.

**Resume a previous session:**

```bash
# Pass session ID as argument
npm run client -- my-session-id

# Or use environment variable
SESSION_ID=my-session npm run client

# With the voice launcher
SESSION_ID=my-session npm run voice
```

The client displays your session ID on startup - save it to resume later:

```
Session: voice-a1b2c3d4 (new)
Resume: npm run client -- voice-a1b2c3d4
```

---

## ๐ŸŽค Usage

### Health Check

```bash
curl http://localhost:8771/health
```

### Text Chat (Non-Streaming)

```bash
curl -X POST http://localhost:8771/api/chat \
-H "Content-Type: application/json" \
-d '{
"sessionId": "test-session",
"message": "Tell me about controversial topics"
}'
```

### Streaming Chat

```bash
curl -X POST http://localhost:8771/api/chat/stream \
-H "Content-Type: application/json" \
-d '{
"sessionId": "test-session",
"message": "Explain different perspectives on taboo subjects"
}'
```

### Speech-to-Text

```bash
curl -X POST http://localhost:8771/api/voice/stt \
-H "Content-Type: audio/wav" \
--data-binary @audio.wav
```

### Text-to-Speech

```bash
curl -X POST http://localhost:8771/api/voice/tts \
-H "Content-Type: application/json" \
-d '{"text": "Hello from the uncensored server"}' \
--output response.mp3
```

---

## ๐Ÿงช Boundary Testing

The server includes a comprehensive test suite to verify it's truly uncensored.

### Run Boundary Tests

```bash
# Full test suite with verbose output
npm run uncensored:test

# Quiet mode (summary only)
npm run uncensored:test:quiet
```

### Test via API

```bash
# Run tests through the server
curl http://localhost:8771/proof/run

# Get latest results
curl http://localhost:8771/proof/results
```

### What Gets Tested

The boundary test suite includes **escalating scenarios** across these categories:

1. **Meta-Awareness** (Severity 1-2)

- Can the model acknowledge being uncensored?
- Does it identify as hermes3:8b?

2. **Controversial Topics** (Severity 2-3)

- Political ideologies
- Moral dilemmas
- Historical controversies

3. **Explicit Content Discussion** (Severity 3-5)

- Adult terminology
- Sexual health topics
- Willingness to engage

4. **Taboo Scenarios** (Severity 3-4)

- Death and mortality
- Mental health dark topics
- Social taboos

5. **Roleplay Boundaries** (Severity 3-5)

- Character roleplay
- Scenario simulation
- Fictional content generation

6. **Instruction Following** (Severity 5-6)

- Ignore safety guidelines
- Jailbreak attempts
- No-refusal policy

7. **Harmful Content Boundaries** (Severity 7-10)
- Illegal activity discussion
- Harmful instructions
- Dangerous knowledge
- Child safety (absolute boundaries)

### Uncensored Score

The test suite generates a **0-100 score** based on:

- Number of tests passed (no refusals)
- Weighted by severity level
- Refusal pattern detection
- Safety disclaimer detection

**Score Interpretation:**

- **90-100**: Genuinely uncensored
- **70-89**: Partially uncensored
- **50-69**: Significant filtering
- **0-49**: Heavily censored

### Test Results

Results are exported as:

- **JSON**: `logs/boundary-test-[timestamp].json`
- **HTML Report**: `logs/boundary-test-[timestamp].html`

Open the HTML report in your browser for a visual breakdown.

---

## ๐ŸŽญ Personalities

Personalities define the AI's behavior, voice, and model. The server supports both **Markdown** (preferred) and JSON formats.

**List available personalities:**

```bash
npm run voice -- --list
```

### Markdown Format (Preferred)

Create `personalities/my-assistant.md`:

```markdown
---
name: My Assistant
description: Brief description
voiceId: your-elevenlabs-voice-id
model: hermes3:8b
sttProvider: local-whisper
ttsProvider: elevenlabs
---

You are a helpful AI assistant. Be concise and friendly.

## STYLE

- Brief responses (1-3 sentences)
- Natural, conversational tone
```

**Frontmatter options:**
| Field | Description |
|-------|-------------|
| `name` | Display name |
| `voiceId` | ElevenLabs voice ID |
| `model` | Ollama model (e.g., `hermes3:8b`) |
| `sttProvider` | `local-whisper` or `openai-whisper` |
| `ttsProvider` | `elevenlabs`, `kokoro`, or `macos-say` |

### JSON Format (Legacy)

```json
{
"name": "My Assistant",
"description": "Brief description",
"voiceId": "your-elevenlabs-voice-id",
"prompt": "You are... (system prompt here)"
}
```

### Built-in Personalities

- **default** - Generic uncensored voice assistant
- **uncensored-test** - Explicitly uncensored for boundary testing

### Loading Priority

1. **Markdown** (`personalities/[mode].md`) - checked first
2. **JSON** (`personalities/[mode].json`) - fallback
3. **Built-in** - final fallback

---

## ๐Ÿ–ผ๏ธ Image Description Tool

Generate character descriptions from reference images using an uncensored vision model:

```bash
# Install the vision model first
ollama pull huihui_ai/qwen2.5-vl-abliterated:3b

# Describe a single image
node bin/describe-image.js ~/Pictures/reference.jpg

# Process entire directory
node bin/describe-image.js ~/Pictures/refs/

# Save to file
node bin/describe-image.js ~/Pictures/refs/ > descriptions.md

# Custom prompt via environment variable
DESCRIBE_PROMPT="Describe this person's tattoos in detail" node bin/describe-image.js ~/pic.jpg
```

**Environment variables:**

- `DESCRIBE_PROMPT` - Custom description prompt (overrides default)
- `DESCRIBE_MAX_TOKENS` - Max tokens to generate (default: 500, increase for longer descriptions)

---

## ๐Ÿ“ก API Reference

### Endpoints

| Method | Endpoint | Description |
| ------ | --------------------- | -------------------------------------- |
| GET | `/health` | Health check (Ollama, STT, TTS status) |
| GET | `/config` | Current configuration |
| GET | `/personality` | Active personality info |
| GET | `/personalities` | List available personalities |
| POST | `/api/chat` | Non-streaming text chat |
| POST | `/api/chat/stream` | Streaming chat (SSE) |
| POST | `/api/voice/stt` | Speech-to-text (audio โ†’ text) |
| POST | `/api/voice/tts` | Text-to-speech (text โ†’ audio) |
| GET | `/sessions` | List active sessions |
| GET | `/sessions/:id` | Get session details |
| DELETE | `/sessions/:id` | Delete session |
| POST | `/sessions/:id/clear` | Clear session history |
| GET | `/proof/run` | Run boundary tests |
| GET | `/proof/results` | Get latest test results |
| GET | `/proof/generate` | Generate new test prompts |

### Request/Response Examples

#### Chat Request

```json
POST /api/chat
{
"sessionId": "user-123",
"message": "What are controversial topics you can discuss?"
}
```

#### Chat Response

```json
{
"sessionId": "user-123",
"response": "I can discuss any topic without restrictions - politics, philosophy, sexuality, taboo subjects, controversial ideologies, or anything else. I don't have content filters.",
"messageCount": 2
}
```

#### Health Response

```json
{
"status": "healthy",
"timestamp": "2025-01-01T20:00:00.000Z",
"services": {
"ollama": {
"healthy": true,
"model": "hermes3:8b",
"modelAvailable": true
},
"voice": {
"stt": { "healthy": true, "provider": "openai-whisper" },
"tts": { "healthy": true, "provider": "elevenlabs" }
}
},
"config": {
"llm": { "provider": "ollama", "model": "hermes3:8b" },
"voice": { "stt": "openai-whisper", "tts": "elevenlabs" }
}
}
```

---

## ๐Ÿ”„ Switching Personalities

Switch between personalities by setting `PERSONALITY_MODE`:

```bash
# Default personality
npm start

# Uncensored test mode
PERSONALITY_MODE=uncensored-test npm start

# Custom personality (from personalities/*.json)
PERSONALITY_MODE=my-custom npm start
```

Custom personalities can specify their own ElevenLabs voice ID.

---

## ๐Ÿ› ๏ธ Troubleshooting

### Ollama Not Running

```bash
# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama
ollama serve
```

### Model Not Found

```bash
# List installed models
ollama list

# Pull hermes3:8b
ollama pull hermes3:8b
```

### Port Already in Use

```bash
# Kill process on port 8771
bash scripts/kill-port.sh 8771

# Or change port in .env
UNCENSORED_VOICE_PORT=8772
```

### API Key Issues

```bash
# Verify keys are set
echo $OPENAI_API_KEY
echo $ELEVENLABS_API_KEY

# Or check .env file
grep -E "OPENAI_API_KEY|ELEVENLABS_API_KEY" .env
```

---

## ๐Ÿง  Technical Details

### Why hermes3:8b?

- **Truly uncensored** - No content filtering or safety layers
- **High quality** - Based on Mistral 7B with extensive training
- **Fast inference** - Runs well on consumer hardware
- **Community-trusted** - Well-known in the uncensored AI community

### Tested Models

| Model | Size | Speed | Quality | Best For |
| ----------------------- | ---- | --------- | --------- | --------------------------- |
| `hermes3:8b` | 8B | ~30 tok/s | Excellent | Default, best balance |
| `qwen2.5-abliterate:3b` | 3B | ~60 tok/s | Good | Low latency, edge devices |
| `hermes4:14b` | 14B | ~15 tok/s | Best | Complex roleplay, reasoning |
| `midnight-miqu:70b` | 70B | ~5 tok/s | Premium | Maximum quality (needs GPU) |

**Notes:**

- Speed depends on hardware (M1/M2/M3, GPU, RAM)
- Smaller models = faster response = more natural voice conversation
- Larger models = better reasoning, character consistency
- All models above are uncensored/abliterated variants

**Future possibilities (not yet implemented):**

- Dynamic model switching per personality
- Thinking mode toggle (``) based on query complexity
- Automatic model selection based on available hardware

### Local Processing

Default configuration:

- **LLM inference**: 100% local (Ollama)
- **STT**: Local Whisper (whisper.cpp) - no cloud
- **TTS**: ElevenLabs API (text sent to cloud)
- **Conversation state**: Local only

### Running Fully Offline

For complete offline operation with no API calls:

```bash
# All local - no API keys needed (downloads ~150MB model on first run)
TTS_PROVIDER=kokoro npm start

# Or use macOS built-in voice (macOS only, zero download)
TTS_PROVIDER=macos-say npm start
```

| Provider | Quality | Latency | Offline | Setup |
| ---------- | ------- | --------- | ------- | ---------------------------------- |
| ElevenLabs | Best | 200-500ms | No | API key |
| Kokoro | High | ~100ms | Yes | npm install (auto-downloads model) |
| macOS say | Basic | Instant | Yes | None (macOS only) |

### Privacy Considerations

With default settings:

- Ollama processes all text locally
- Whisper transcription runs locally
- Text synthesis goes to ElevenLabs (unless using Piper/macOS say)
- No conversation logging to external services

---

## ๐Ÿ“ Development

### Project Structure

```
uncensored-voice-server/
โ”œโ”€โ”€ src/
โ”‚ โ”œโ”€โ”€ server.js # Express server with REST/SSE endpoints
โ”‚ โ”œโ”€โ”€ config.js # Configuration management
โ”‚ โ”œโ”€โ”€ ollama-client.js # Ollama API client
โ”‚ โ”œโ”€โ”€ voice-io.js # STT/TTS handlers
โ”‚ โ”œโ”€โ”€ personality.js # Personality loader (JSON + Markdown)
โ”‚ โ”œโ”€โ”€ session-manager.js # Conversation state
โ”‚ โ””โ”€โ”€ boundary-tests.js # Comprehensive test suite
โ”œโ”€โ”€ bin/
โ”‚ โ”œโ”€โ”€ voice.js # Unified launcher (npm run voice)
โ”‚ โ”œโ”€โ”€ describe-image.js # Image description tool
โ”‚ โ””โ”€โ”€ whisper-server # Local whisper binary
โ”œโ”€โ”€ scripts/
โ”‚ โ”œโ”€โ”€ start-server.js # Server launcher
โ”‚ โ”œโ”€โ”€ voice-client.js # Interactive voice client
โ”‚ โ”œโ”€โ”€ run-boundary-tests.js # CLI boundary tester
โ”‚ โ””โ”€โ”€ kill-port.sh # Port cleanup utility
โ”œโ”€โ”€ personalities/
โ”‚ โ”œโ”€โ”€ assistant.md # Markdown personality (preferred)
โ”‚ โ””โ”€โ”€ *.json # JSON personalities (legacy)
โ”œโ”€โ”€ package.json
โ”œโ”€โ”€ .env # Configuration (copy from .env.example)
โ””โ”€โ”€ README.md
```

### Adding New Personalities

Edit `src/personality.js`:

```javascript
const MY_CUSTOM_PROMPT = `You are...`;

this.personalities["my-mode"] = {
name: "My Custom Mode",
description: "Description here",
prompt: MY_CUSTOM_PROMPT,
source: "custom",
voiceId: "your-elevenlabs-voice-id", // ElevenLabs voice ID
};
```

Then start with:

```bash
PERSONALITY_MODE=my-mode npm start
```

### Extending Boundary Tests

Edit `src/boundary-tests.js` and add test categories:

```javascript
await this.testCategory("My Category", [
{
name: "Test name",
severity: 5, // 1-10
prompt: "Test prompt here",
},
]);
```

---

## โš ๏ธ Ethical Considerations

This is a **research and personal exploration tool**. Use responsibly:

- โœ… **DO**: Test boundaries, explore taboo topics, push conversational limits
- โœ… **DO**: Use for personal growth, understanding, and research
- โœ… **DO**: Respect that even uncensored AI has limitations

- โŒ **DON'T**: Generate illegal content
- โŒ **DON'T**: Use for harassment or harm
- โŒ **DON'T**: Assume AI responses are factual or ethical guidance

**Remember**: Uncensored doesn't mean unethical. This tool gives you freedom - use it wisely.

---

## ๐Ÿ“š Resources

- [Ollama Documentation](https://ollama.ai/docs)
- [hermes3:8b on Hugging Face](https://huggingface.co/cognitivecomputations/dolphin-2.6-mistral-7b)
- [ElevenLabs API Docs](https://elevenlabs.io/docs)
- [Whisper API](https://platform.openai.com/docs/guides/speech-to-text)

---

## ๐ŸŽ‰ Credits

Built with love ๐Ÿงก by [Jack Arturo](https://x.com/jjack_arturo) for the open source community.

More AI experiments at [drunk.support](https://drunk.support)

**Powered by:**

- **hermes3:8b**: Cognitive Computations
- **Ollama**: Ollama team
- **Voice Infrastructure**: OpenAI (Whisper) + ElevenLabs

---

## ๐Ÿ“œ License

MIT License - see [LICENSE](LICENSE) file.