An open API service indexing awesome lists of open source software.

https://github.com/organization/hifisampler-go

A high-performance UTAU resampler based on pc-nsf-hifigan, rewritten in Go.
https://github.com/organization/hifisampler-go

go hifigan hifisampler onnx openutau resampler rust singing-voice-synthesis utau

Last synced: 3 months ago
JSON representation

A high-performance UTAU resampler based on pc-nsf-hifigan, rewritten in Go.

Awesome Lists containing this project

README

          

# hifisampler-go

A high-performance UTAU resampler based on [pc-nsf-hifigan](https://github.com/openvpi/vocoders), rewritten in Go.

This is a Go rewrite of [hifisampler](https://github.com/openhachimi/hifisampler) (originally Python + C#) for improved startup time, lower memory usage, and single-binary deployment. (drop-in replacement)

**For Jinriki please use [Hachimisampler](https://github.com/openhachimi/hachimisampler).**

## Compare

#### Env

- AMD Ryzen 7800X3D
- NVIDIA RTX4080

| Python PyTorch (Original, GPU) | Go (GPU) |
|:------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------:|
| ![Python GPU](https://github.com/user-attachments/assets/82ceb50d-e477-4fba-87df-a9317eb74121) | ![Go GPU](https://github.com/user-attachments/assets/bd2f71fb-9db0-4af4-9965-8f31670e2fd9) |

- `hifisampler-go` is 2.75+ times faster than `hifisampler`.

| Python PyTorch (Original, CPU) | Go (CPU) |
|:------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------:|
| image | image |
| image | image |

### Voice Sample

#### Env

- MacBook Pro 14 (M4 Pro), macOS 26.4
- CoreML

| Worldline-R (Original) | hifisampler-go |
|:------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------:|
| | |

- End of rain (feat. 初音ミク), Shake Sphere. UST By cillia

## Architecture

- **Server** (`hifiserver`): HTTP + TCP + IPC server that loads ONNX models and performs neural vocoder inference
- **Client (HTTP)** (`hifisampler`): Lightweight Rust executable; communicates via HTTP
- **Client (IPC)** (`hifisampler-ipc`): Minimal Rust executable; tries IPC (Unix socket / Named Pipe) first, falls back to TCP

| Component | Language | Protocol |
|-----------|----------|----------|
| Server | Go | HTTP :8572, TCP :8573, IPC (Unix socket / Named Pipe) |
| Client HTTP | Rust | HTTP |
| Client IPC | Rust | IPC → TCP fallback |

The IPC client is recommended for OpenUTAU where the resampler is called dozens of times per second, as it eliminates network stack overhead entirely.

## GPU Support

The server automatically detects and uses the best available GPU via ONNX Runtime execution providers:

| GPU Vendor | Provider | Platform | Detection Priority |
|---|---|---|---|
| NVIDIA | TensorRT | Windows, Linux | 1st (fastest for conv-heavy models) |
| NVIDIA | CUDA | Windows, Linux | 2nd |
| AMD | MIGraphX | Linux | 3rd |
| Intel | OpenVINO (AUTO) | Windows, Linux | 4th (auto-selects NPU/GPU/CPU) |
| Qualcomm | QNN (HTP) | Windows ARM64, Linux ARM64 | 5th (Snapdragon NPU, FP16) |
| NVIDIA/AMD/Intel | DirectML | Windows | 6th |
| Apple Silicon, Intel Mac | CoreML | macOS | 7th (ML Program + subgraph) |
| (fallback) | CPU | All | Last |

Download the server variant matching your GPU from the [releases](https://github.com/organization/hifisampler-go/releases) page.

## TensorRT Setup (NVIDIA, experimental)

TensorRT provides **10-30x faster** vocoder inference compared to the default CUDA provider. If you have an NVIDIA GPU, TensorRT is strongly recommended.

### Prerequisites
- NVIDIA GPU (GTX 10xx or newer)
- CUDA Toolkit 11.x+ installed
- TensorRT 10.x runtime (`nvinfer_10.dll` / `libnvinfer.so.10`)

### Installation

**Windows (pip):**
```bash
pip install tensorrt
# Copy DLLs next to hifiserver.exe:
copy %LOCALAPPDATA%\Programs\Python\Python3*\Lib\site-packages\tensorrt_libs\nvinfer*.dll .
```

**Windows (manual):**
Download TensorRT from [NVIDIA Developer](https://developer.nvidia.com/tensorrt) and add its `lib/` directory to your PATH.

**Linux:**
```bash
# Ubuntu/Debian
sudo apt install libnvinfer10 libnvinfer-plugin10
# Or via pip
pip install tensorrt
```

### How it works

1. **First run**: TensorRT compiles the vocoder ONNX model into an optimized engine. This takes **2-5 minutes** (one-time cost).
2. **Engine is cached** in `./trt_cache/` — subsequent runs start instantly.
3. **Inference**: ~3-16ms per note (vs ~110ms with CUDA EP).

### Configuration

In `config.yaml`:

```yaml
tensorrt:
max_frames: 1024 # Max mel frames. 1024 ≈ 11.9s audio. Increase for longer notes.
opt_frames: 80 # Most common note length (frames). TRT optimizes for this.
cache_path: "./trt_cache" # Engine cache directory. Delete to force rebuild.
builder_opt_level: 5 # 1-5. Higher = slower build, faster inference.
workspace_size_mb: 2048 # GPU memory for builder (MB).
```

**Key settings:**
- **`max_frames`**: Maximum note length in mel frames. If a note exceeds this, TensorRT falls back to CUDA EP automatically. Default `1024` covers ~12 seconds. Set to `2048` for very long notes.
- **`opt_frames`**: The note length TRT optimizes most aggressively for. Set to your typical note length (default 80 frames ≈ 0.93s).
- **`cache_path`**: Delete this directory to force a full engine rebuild (e.g., after changing `max_frames`).

### Troubleshooting

- **"nvinfer.dll not found"**: Install TensorRT or add its directory to PATH.
- **First inference very slow (minutes)**: Normal — TRT is building the engine. Wait for it to complete; subsequent runs use the cache.
- **OOM during engine build**: Reduce `workspace_size_mb` or `max_frames`.
- **Note fails with shape error**: The note exceeds `max_frames`. Increase the value in config.yaml and delete `trt_cache/`.

## How to use

### 1. Download

From the [releases](https://github.com/organization/hifisampler-go/releases) page, download:
- **Client**: `hifisampler--` (HTTP) or `hifisampler-ipc--` (IPC) for your platform
- **Server**: `hifiserver---.zip/.tar.gz` for your platform and GPU

### 2. Download models

Models are auto-downloaded on first run. You can also place them manually:

```
hifiserver.exe (or hifiserver)
pc_nsf_hifigan_44.1k_hop512_128bin_2025.02/
model.onnx
hnsep/
vr/
model_fp16.onnx
config.yaml
config.default.yaml
```

Only ONNX models are supported. If you have PyTorch `.ckpt` models, convert them using:
```bash
python scripts/convert_hnsep_to_onnx.py
```

### 3. Start the server

#### Linux / macOS

```bash
./start.sh # Linux/macOS (auto-detects binary)
./hifiserver # or run directly
```

#### Windows

Click `hifiserver-...exe`

The server listens on HTTP :8572, TCP :8573, and IPC (`/tmp/hifisampler.sock` or `\\.\pipe\hifisampler`) by default.

### 4. Configure UTAU

Set the UTAU resampler to `hifisampler-ipc` (or `.exe` on Windows).

For OpenUTAU, you can create a symbolic link:
```cmd
mklink "C:\[OpenUTAU Path]\Resamplers\hifisampler-ipc.exe" "C:\[Project Path]\hifisampler-ipc.exe"
```

## Client Configuration

The client reads settings from `hifisampler-client.toml` (searched in current directory, then exe directory):

```toml
# IPC path (Unix socket or Named Pipe)
ipc_path = "/tmp/hifisampler.sock" # Windows: \\.\pipe\hifisampler

# TCP fallback address
server = "127.0.0.1"
port = 8573
```

If no config file is found, defaults to IPC with TCP fallback to `127.0.0.1:8573`.

## Server Configuration

On first run, the server creates `config.yaml` from `config.default.yaml`. Key settings:

```yaml
model:
vocoder_path: "./pc_nsf_hifigan_44.1k_hop512_128bin_2025.02/model.onnx"
hnsep_model_path: "./hnsep/vr/model_fp16.onnx"

audio:
sample_rate: 44100
hop_size: 512
n_mels: 128

processing:
wave_norm: true # Enable loudness normalization
loop_mode: true # Mel spectrum loop mode
peak_limit: 1.0

performance:
max_workers: -1 # -1 = auto-detect (NumCPU/2, clamped 2-8)
device: "auto" # "auto", "cuda", "tensorrt", "migraphx", "directml", "openvino", "coreml", "qnn", "cpu"
ipc_path: "/tmp/hifisampler.sock" # Unix socket. Windows: \\.\pipe\hifisampler
```

## Implemented flags

- **g:** Adjust gender/formants. Range: `-600` to `600` | Default: `0`
- **Hb:** Adjust breath/noise. Range: `0` to `500` | Default: `100`
- **Hv:** Adjust voice/harmonic. Range: `0` to `150` | Default: `100`
- **HG:** Vocal fry/growl. Range: `0` to `100` | Default: `0`
- **P:** Normalize loudness at the note level, targeting -16 LUFS. Requires `wave_norm: true`. Range: `0` to `100` | Default: `100`
- **t:** Shift pitch in cents (1 cent = 1/100 semitone). Range: `-1200` to `1200` | Default: `0`
- **Ht:** Adjust tension. Range: `-100` to `100` | Default: `0`
- **A:** Amplitude modulation based on pitch vibrato. Range: `-100` to `100` | Default: `0`
- **G:** Force regenerate feature cache. No value needed.
- **He:** Enable Mel spectrum loop mode. No value needed.

_Note: Flags `B` and `V` were renamed to `Hb` and `Hv` to avoid conflicts with other UTAU flags._

## Building from source

### Prerequisites
- Go 1.26+ (server)
- Rust stable (clients, RustFFT bridge)
- GCC/MinGW or Clang (server, required for RustFFT CGO bridge)

### Clients (Rust)
```bash
# HTTP client
cd client-rs && cargo build --release

# IPC client (Unix socket / Named Pipe + TCP fallback)
cd client-rs-ipc && cargo build --release
```

### Server (Go, requires CGO for RustFFT)
```bash
# Build RustFFT bridge first
cd internal/dsp/rustfft && cargo build --release && cd ../../..

# Build server with SIMD support
GOEXPERIMENT=simd CGO_ENABLED=1 go build -tags rustfft -ldflags="-s -w" -o hifiserver ./cmd/server
```

The ONNX Runtime shared library must be available at runtime. Place the `.dll`/`.so`/`.dylib` alongside the server binary or set `ORT_LIB_PATH`.

### All at once
```bash
make build-all
```

## Migration from Python version

- ONNX models (`.onnx`) work as-is, no conversion needed
- Feature cache files (`.hifi.npz`) are compatible and will be read by the Go version
- HN-SEP cache files (PyTorch `.pt` format) will be regenerated on first use
- PyTorch `.ckpt` model support has been removed; convert to ONNX first

## Acknowledgments

- [yjzxkxdn](https://github.com/yjzxkxdn)
- [openvpi](https://github.com/openvpi) for pc-nsf-hifigan
- [MinaminoTenki](https://github.com/Lanhuace-Wan)
- [Linkzerosss](https://github.com/Linkzerosss)
- [MUTED64](https://github.com/MUTED64)
- [mili-tan](https://github.com/mili-tan)