https://github.com/organization/hifisampler-go

A high-performance UTAU resampler based on pc-nsf-hifigan, rewritten in Go.
https://github.com/organization/hifisampler-go
go hifigan hifisampler onnx openutau resampler rust singing-voice-synthesis utau
Last synced: 3 months ago
JSON representation
A high-performance UTAU resampler based on pc-nsf-hifigan, rewritten in Go.
Host: GitHub
URL: https://github.com/organization/hifisampler-go
Owner: organization
License: apache-2.0
Created: 2026-03-29T17:26:02.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-04-03T08:44:46.000Z (3 months ago)
Last Synced: 2026-04-03T20:02:39.876Z (3 months ago)
Topics: go, hifigan, hifisampler, onnx, openutau, resampler, rust, singing-voice-synthesis, utau
Language: Go
Homepage:
Size: 22 MB
Stars: 17
Watchers: 1
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # hifisampler-go

A high-performance UTAU resampler based on [pc-nsf-hifigan](https://github.com/openvpi/vocoders), rewritten in Go.

This is a Go rewrite of [hifisampler](https://github.com/openhachimi/hifisampler) (originally Python + C#) for improved startup time, lower memory usage, and single-binary deployment. (drop-in replacement)

**For Jinriki please use [Hachimisampler](https://github.com/openhachimi/hachimisampler).**

## Compare

#### Env

- AMD Ryzen 7800X3D

- NVIDIA RTX4080

|                    Python PyTorch (Original, GPU)                   |                                                        Go (GPU) |

|:------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------:|

| ![Python GPU](https://github.com/user-attachments/assets/82ceb50d-e477-4fba-87df-a9317eb74121) | ![Go GPU](https://github.com/user-attachments/assets/bd2f71fb-9db0-4af4-9965-8f31670e2fd9) |

- `hifisampler-go` is 2.75+ times faster than `hifisampler`.

|                    Python PyTorch (Original, CPU)                   |                                                        Go (CPU) |

|:------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------:|

|  |  |

|  |  |

### Voice Sample

#### Env

- MacBook Pro 14 (M4 Pro), macOS 26.4

- CoreML

|                 Worldline-R (Original)                 |                                                         hifisampler-go                                                      |

|:------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------:|

|  |  |

- End of rain (feat. 初音ミク), Shake Sphere. UST By cillia

## Architecture

- **Server** (`hifiserver`): HTTP + TCP + IPC server that loads ONNX models and performs neural vocoder inference

- **Client (HTTP)** (`hifisampler`): Lightweight Rust executable; communicates via HTTP

- **Client (IPC)** (`hifisampler-ipc`): Minimal Rust executable; tries IPC (Unix socket / Named Pipe) first, falls back to TCP

| Component | Language | Protocol |

|-----------|----------|----------|

| Server    | Go       | HTTP :8572, TCP :8573, IPC (Unix socket / Named Pipe) |

| Client HTTP | Rust   | HTTP |

| Client IPC  | Rust   | IPC → TCP fallback |

The IPC client is recommended for OpenUTAU where the resampler is called dozens of times per second, as it eliminates network stack overhead entirely.

## GPU Support

The server automatically detects and uses the best available GPU via ONNX Runtime execution providers:

| GPU Vendor | Provider | Platform | Detection Priority |

|---|---|---|---|

| NVIDIA | TensorRT | Windows, Linux | 1st (fastest for conv-heavy models) |

| NVIDIA | CUDA | Windows, Linux | 2nd |

| AMD | MIGraphX | Linux | 3rd |

| Intel | OpenVINO (AUTO) | Windows, Linux | 4th (auto-selects NPU/GPU/CPU) |

| Qualcomm | QNN (HTP) | Windows ARM64, Linux ARM64 | 5th (Snapdragon NPU, FP16) |

| NVIDIA/AMD/Intel | DirectML | Windows | 6th |

| Apple Silicon, Intel Mac | CoreML | macOS | 7th (ML Program + subgraph) |

| (fallback) | CPU | All | Last |

Download the server variant matching your GPU from the [releases](https://github.com/organization/hifisampler-go/releases) page.

## TensorRT Setup (NVIDIA, experimental)

TensorRT provides **10-30x faster** vocoder inference compared to the default CUDA provider. If you have an NVIDIA GPU, TensorRT is strongly recommended.

### Prerequisites

- NVIDIA GPU (GTX 10xx or newer)

- CUDA Toolkit 11.x+ installed

- TensorRT 10.x runtime (`nvinfer_10.dll` / `libnvinfer.so.10`)

### Installation

**Windows (pip):**

```bash

pip install tensorrt

# Copy DLLs next to hifiserver.exe:

copy %LOCALAPPDATA%\Programs\Python\Python3*\Lib\site-packages\tensorrt_libs\nvinfer*.dll .

```

**Windows (manual):**

Download TensorRT from [NVIDIA Developer](https://developer.nvidia.com/tensorrt) and add its `lib/` directory to your PATH.

**Linux:**

```bash

# Ubuntu/Debian

sudo apt install libnvinfer10 libnvinfer-plugin10

# Or via pip

pip install tensorrt

```

### How it works

1. **First run**: TensorRT compiles the vocoder ONNX model into an optimized engine. This takes **2-5 minutes** (one-time cost).

2. **Engine is cached** in `./trt_cache/` — subsequent runs start instantly.

3. **Inference**: ~3-16ms per note (vs ~110ms with CUDA EP).

### Configuration

In `config.yaml`:

```yaml

tensorrt:

  max_frames: 1024      # Max mel frames. 1024 ≈ 11.9s audio. Increase for longer notes.

  opt_frames: 80        # Most common note length (frames). TRT optimizes for this.

  cache_path: "./trt_cache"  # Engine cache directory. Delete to force rebuild.

  builder_opt_level: 5  # 1-5. Higher = slower build, faster inference.

  workspace_size_mb: 2048  # GPU memory for builder (MB).

```

**Key settings:**

- **`max_frames`**: Maximum note length in mel frames. If a note exceeds this, TensorRT falls back to CUDA EP automatically. Default `1024` covers ~12 seconds. Set to `2048` for very long notes.

- **`opt_frames`**: The note length TRT optimizes most aggressively for. Set to your typical note length (default 80 frames ≈ 0.93s).

- **`cache_path`**: Delete this directory to force a full engine rebuild (e.g., after changing `max_frames`).

### Troubleshooting

- **"nvinfer.dll not found"**: Install TensorRT or add its directory to PATH.

- **First inference very slow (minutes)**: Normal — TRT is building the engine. Wait for it to complete; subsequent runs use the cache.

- **OOM during engine build**: Reduce `workspace_size_mb` or `max_frames`.

- **Note fails with shape error**: The note exceeds `max_frames`. Increase the value in config.yaml and delete `trt_cache/`.

## How to use

### 1. Download

From the [releases](https://github.com/organization/hifisampler-go/releases) page, download:

- **Client**: `hifisampler--` (HTTP) or `hifisampler-ipc--` (IPC) for your platform

- **Server**: `hifiserver---.zip/.tar.gz` for your platform and GPU

### 2. Download models

Models are auto-downloaded on first run. You can also place them manually:

```

hifiserver.exe (or hifiserver)

pc_nsf_hifigan_44.1k_hop512_128bin_2025.02/

  model.onnx

hnsep/

  vr/

    model_fp16.onnx

    config.yaml

config.default.yaml

```

Only ONNX models are supported. If you have PyTorch `.ckpt` models, convert them using:

```bash

python scripts/convert_hnsep_to_onnx.py

```

### 3. Start the server

#### Linux / macOS

```bash

./start.sh            # Linux/macOS (auto-detects binary)

./hifiserver          # or run directly

```

#### Windows

Click `hifiserver-...exe`

The server listens on HTTP :8572, TCP :8573, and IPC (`/tmp/hifisampler.sock` or `\\.\pipe\hifisampler`) by default.

### 4. Configure UTAU

Set the UTAU resampler to `hifisampler-ipc` (or `.exe` on Windows).

For OpenUTAU, you can create a symbolic link:

```cmd

mklink "C:\[OpenUTAU Path]\Resamplers\hifisampler-ipc.exe" "C:\[Project Path]\hifisampler-ipc.exe"

```

## Client Configuration

The client reads settings from `hifisampler-client.toml` (searched in current directory, then exe directory):

```toml

# IPC path (Unix socket or Named Pipe)

ipc_path = "/tmp/hifisampler.sock"  # Windows: \\.\pipe\hifisampler

# TCP fallback address

server = "127.0.0.1"

port = 8573

```

If no config file is found, defaults to IPC with TCP fallback to `127.0.0.1:8573`.

## Server Configuration

On first run, the server creates `config.yaml` from `config.default.yaml`. Key settings:

```yaml

model:

  vocoder_path: "./pc_nsf_hifigan_44.1k_hop512_128bin_2025.02/model.onnx"

  hnsep_model_path: "./hnsep/vr/model_fp16.onnx"

audio:

  sample_rate: 44100

  hop_size: 512

  n_mels: 128

processing:

  wave_norm: true       # Enable loudness normalization

  loop_mode: true       # Mel spectrum loop mode

  peak_limit: 1.0

performance:

  max_workers: -1       # -1 = auto-detect (NumCPU/2, clamped 2-8)

  device: "auto"        # "auto", "cuda", "tensorrt", "migraphx", "directml", "openvino", "coreml", "qnn", "cpu"

  ipc_path: "/tmp/hifisampler.sock"  # Unix socket. Windows: \\.\pipe\hifisampler

```

## Implemented flags

- **g:** Adjust gender/formants. Range: `-600` to `600` | Default: `0`

- **Hb:** Adjust breath/noise. Range: `0` to `500` | Default: `100`

- **Hv:** Adjust voice/harmonic. Range: `0` to `150` | Default: `100`

- **HG:** Vocal fry/growl. Range: `0` to `100` | Default: `0`

- **P:** Normalize loudness at the note level, targeting -16 LUFS. Requires `wave_norm: true`. Range: `0` to `100` | Default: `100`

- **t:** Shift pitch in cents (1 cent = 1/100 semitone). Range: `-1200` to `1200` | Default: `0`

- **Ht:** Adjust tension. Range: `-100` to `100` | Default: `0`

- **A:** Amplitude modulation based on pitch vibrato. Range: `-100` to `100` | Default: `0`

- **G:** Force regenerate feature cache. No value needed.

- **He:** Enable Mel spectrum loop mode. No value needed.

_Note: Flags `B` and `V` were renamed to `Hb` and `Hv` to avoid conflicts with other UTAU flags._

## Building from source

### Prerequisites

- Go 1.26+ (server)

- Rust stable (clients, RustFFT bridge)

- GCC/MinGW or Clang (server, required for RustFFT CGO bridge)

### Clients (Rust)

```bash

# HTTP client

cd client-rs && cargo build --release

# IPC client (Unix socket / Named Pipe + TCP fallback)

cd client-rs-ipc && cargo build --release

```

### Server (Go, requires CGO for RustFFT)

```bash

# Build RustFFT bridge first

cd internal/dsp/rustfft && cargo build --release && cd ../../..

# Build server with SIMD support

GOEXPERIMENT=simd CGO_ENABLED=1 go build -tags rustfft -ldflags="-s -w" -o hifiserver ./cmd/server

```

The ONNX Runtime shared library must be available at runtime. Place the `.dll`/`.so`/`.dylib` alongside the server binary or set `ORT_LIB_PATH`.

### All at once

```bash

make build-all

```

## Migration from Python version

- ONNX models (`.onnx`) work as-is, no conversion needed

- Feature cache files (`.hifi.npz`) are compatible and will be read by the Go version

- HN-SEP cache files (PyTorch `.pt` format) will be regenerated on first use

- PyTorch `.ckpt` model support has been removed; convert to ONNX first

## Acknowledgments

- [yjzxkxdn](https://github.com/yjzxkxdn)

- [openvpi](https://github.com/openvpi) for pc-nsf-hifigan

- [MinaminoTenki](https://github.com/Lanhuace-Wan)

- [Linkzerosss](https://github.com/Linkzerosss)

- [MUTED64](https://github.com/MUTED64)

- [mili-tan](https://github.com/mili-tan)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/organization/hifisampler-go

Awesome Lists containing this project

README