{"id":50256094,"url":"https://github.com/talker93/visqol-python","last_synced_at":"2026-05-27T06:01:39.223Z","repository":{"id":346283698,"uuid":"1189206999","full_name":"talker93/visqol-python","owner":"talker93","description":"A pure Python implementation of Google's ViSQOL (Virtual Speech Quality Objective Listener) for objective audio/speech quality assessment.","archived":false,"fork":false,"pushed_at":"2026-05-26T04:36:13.000Z","size":1018,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-26T06:28:52.644Z","etag":null,"topics":["audio-analysis","audio-codec","audio-processing","audio-quality","mos","numba","objective-metric","perceptual-audio","pesq","polqa","speech-quality","tflite","visqol"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/visqol-python/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/talker93.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-23T04:47:41.000Z","updated_at":"2026-05-26T04:49:11.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/talker93/visqol-python","commit_stats":null,"previous_names":["talker93/visqol-python"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/talker93/visqol-python","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/talker93%2Fvisqol-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/talker93%2Fvisqol-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/talker93%2Fvisqol-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/talker93%2Fvisqol-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/talker93","download_url":"https://codeload.github.com/talker93/visqol-python/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/talker93%2Fvisqol-python/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33553127,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-27T02:00:06.184Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-analysis","audio-codec","audio-processing","audio-quality","mos","numba","objective-metric","perceptual-audio","pesq","polqa","speech-quality","tflite","visqol"],"created_at":"2026-05-27T06:01:37.618Z","updated_at":"2026-05-27T06:01:39.187Z","avatar_url":"https://github.com/talker93.png","language":"Python","funding_links":[],"categories":["Audio Related Packages"],"sub_categories":[],"readme":"# ViSQOL (Python)\n\n[![PyPI version](https://img.shields.io/pypi/v/visqol-python)](https://pypi.org/project/visqol-python/)\n[![CI](https://github.com/talker93/visqol-python/actions/workflows/ci.yml/badge.svg)](https://github.com/talker93/visqol-python/actions/workflows/ci.yml)\n[![Python](https://img.shields.io/pypi/pyversions/visqol-python)](https://pypi.org/project/visqol-python/)\n[![License](https://img.shields.io/github/license/talker93/visqol-python)](LICENSE)\n\nA pure Python implementation of [Google's ViSQOL](https://github.com/google/visqol) (Virtual Speech Quality Objective Listener) for objective audio/speech quality assessment.\n\nViSQOL compares a reference audio signal with a degraded version and outputs a **MOS-LQO** (Mean Opinion Score - Listening Quality Objective) score on a scale of **1.0 – 5.0**.\n\n## Features\n\n- **Two modes**: Audio mode (music/general audio at 48 kHz) and Speech mode (speech at 16 kHz)\n- **High accuracy**: 12/12 conformance tests pass against the official C++ implementation\n  - Audio mode: 9/10 tests produce **identical** MOS scores (diff = 0.000000), 1 test diff = 0.000117\n  - Speech mode (polynomial): diff = 0.001057\n  - Speech mode (lattice TFLite): diff = 0.002341\n- **Two speech quality mappers** matching C++ ViSQOL:\n  - **Lattice (default)** — deep-lattice TFLite network (`--use_lattice_model=true` in C++); requires the optional `[lattice]` extra\n  - **Polynomial (fallback)** — legacy exponential fit (`--use_lattice_model=false` in C++)\n- **Pure Python**: no C/C++ compilation required (the optional `[lattice]` extra adds the Google `ai-edge-litert` TFLite runtime as a binary wheel)\n- **Minimal dependencies**: 4 core pip packages (`numpy`, `scipy`, `soundfile`, `libsvm-official`)\n- **Optional Numba acceleration**: `pip install visqol-python[accel]` for JIT-compiled Gammatone filterbank (parallel) and a fused NSIM + DP patch matching kernel\n- **Optional pyFFTW backend**: `pip install visqol-python[fftw]` routes alignment / xcorr FFTs through FFTW3 — **~16× overall speedup**, RTF 0.036 (vs C++ estimate 0.093)\n- **Batch \u0026 parallel evaluation**: `measure_batch(parallel=True)` for multi-process execution across CPU cores\n- **Fully typed**: PEP 561 `py.typed`, strict mypy, ruff-enforced code style\n\n## Installation\n\n```bash\npip install visqol-python\n```\n\nFor **C++-default-equivalent speech mode** (deep-lattice TFLite mapper):\n\n```bash\npip install visqol-python[lattice]   # requires Python ≥ 3.10\n```\n\nFor **Numba-accelerated** Gammatone filtering and the fused NSIM + DP kernel:\n\n```bash\npip install visqol-python[accel]\n```\n\nFor **FFTW3-backed alignment FFTs** via pyFFTW:\n\n```bash\npip install visqol-python[fftw]\n```\n\nInstall everything (lattice + numba + fftw):\n\n```bash\npip install visqol-python[all]\n```\n\nOr install from source:\n\n```bash\ngit clone https://github.com/talker93/visqol-python.git\ncd visqol-python\npip install -e \".[dev]\"\n```\n\n\u003e **Note on speech mode parity**: Without the `[lattice]` extra, speech mode falls back to the polynomial mapping (equivalent to running C++ ViSQOL with `--use_lattice_model=false`). The polynomial can over-predict MOS by 1–2 points on degraded speech vs the C++ default. Install `[lattice]` whenever you need numbers that line up with the C++ default behaviour (see [issue #1](https://github.com/talker93/visqol-python/issues/1)).\n\n## Quick Start\n\n### Python API\n\n```python\nfrom visqol import VisqolApi\n\n# Audio mode (default) - for music and general audio\napi = VisqolApi()\napi.create(mode=\"audio\")\nresult = api.measure(\"reference.wav\", \"degraded.wav\")\nprint(f\"MOS-LQO: {result.moslqo:.4f}\")\n\n# Speech mode - for speech signals\napi = VisqolApi()\napi.create(mode=\"speech\")\nresult = api.measure(\"ref_speech.wav\", \"deg_speech.wav\")\nprint(f\"MOS-LQO: {result.moslqo:.4f}\")\n```\n\n### Using NumPy Arrays\n\n```python\nimport numpy as np\nimport soundfile as sf\nfrom visqol import VisqolApi\n\nref, sr = sf.read(\"reference.wav\")\ndeg, _  = sf.read(\"degraded.wav\")\n\napi = VisqolApi()\napi.create(mode=\"audio\")\nresult = api.measure_from_arrays(ref, deg, sample_rate=sr)\nprint(f\"MOS-LQO: {result.moslqo:.4f}\")\n```\n\n### Batch Evaluation\n\n```python\nfrom visqol import VisqolApi\n\napi = VisqolApi()\napi.create(mode=\"audio\")\n\nfile_pairs = [\n    (\"ref1.wav\", \"deg1.wav\"),\n    (\"ref2.wav\", \"deg2.wav\"),\n    (\"ref3.wav\", \"deg3.wav\"),\n]\n\n# Sequential with progress callback\nresults = api.measure_batch(\n    file_pairs,\n    progress_callback=lambda done, total: print(f\"{done}/{total}\"),\n)\n\n# Multi-process parallel (uses all CPU cores)\nresults = api.measure_batch(file_pairs, parallel=True, max_workers=4)\n\nfor pair, result in zip(file_pairs, results):\n    if isinstance(result, Exception):\n        print(f\"{pair}: FAILED — {result}\")\n    else:\n        print(f\"{pair}: MOS-LQO = {result.moslqo:.4f}\")\n```\n\n### Command Line\n\n```bash\n# Audio mode (default)\npython -m visqol -r reference.wav -d degraded.wav\n\n# Speech mode\npython -m visqol -r reference.wav -d degraded.wav --speech_mode\n\n# Verbose output (per-patch details)\npython -m visqol -r reference.wav -d degraded.wav -v\n```\n\n**CLI options:**\n\n| Flag | Description |\n|------|-------------|\n| `-r`, `--reference` | Path to reference WAV file (required) |\n| `-d`, `--degraded` | Path to degraded WAV file (required) |\n| `--speech_mode` | Use speech mode (16 kHz) |\n| `--no_lattice_model` | Speech mode: disable lattice TFLite mapper, use polynomial fallback |\n| `--lattice_model` | Custom path to lattice `.tflite` model (speech mode) |\n| `--unscaled_speech` | Don't scale polynomial speech MOS to 5.0 (polynomial only) |\n| `--model` | Custom SVR model file path (audio mode only) |\n| `--search_window` | Search window radius (default: 60) |\n| `--verbose`, `-v` | Show detailed per-patch results |\n\n## Output\n\nThe `measure()` method returns a `SimilarityResult` object with:\n\n| Field | Description |\n|-------|-------------|\n| `moslqo` | MOS-LQO score (1.0 – 5.0) |\n| `vnsim` | Mean NSIM across all patches |\n| `fvnsim` | Per-frequency-band mean NSIM |\n| `fstdnsim` | Per-frequency-band std of NSIM |\n| `fvdegenergy` | Per-frequency-band degraded energy |\n| `patch_sims` | List of per-patch similarity details |\n\n## Modes\n\n### Audio Mode (default)\n- Target sample rate: **48 kHz**\n- 32 Gammatone frequency bands (50 Hz – 15 000 Hz)\n- Quality mapping: SVR (Support Vector Regression) model\n- Best for: music, environmental audio, codecs\n\n### Speech Mode\n- Target sample rate: **16 kHz**\n- 21 Gammatone frequency bands (50 Hz – 8 000 Hz)\n- VAD (Voice Activity Detection) based patch selection\n- Quality mapping (choose one):\n  - **Deep-lattice TFLite (default)** — same mapper as C++ ViSQOL's default `--use_lattice_model=true`; requires `pip install visqol-python[lattice]`\n  - **Exponential polynomial (fallback)** — same as C++ `--use_lattice_model=false`; used automatically when the lattice runtime is not installed\n- Toggle from Python: `api.create(mode=\"speech\", use_lattice_model=False)`\n- Toggle from CLI: `--no_lattice_model`\n- Best for: speech, VoIP, telephony\n\n## Performance\n\nMeasured on Apple M-series, Python 3.13, audio mode on the `guitar48_stereo` 12.5 s conformance case (3-run average):\n\n| Configuration | RTF | Typical Time | Speedup vs pure Python |\n|---|---|---|---|\n| Pure Python + NumPy/SciPy | 0.58 | ~7 s | 1.0× |\n| + `[accel]` (Numba JIT) | 0.067 | ~0.84 s | 8.7× |\n| + `[accel] [fftw]` (Numba + FFTW3) | **0.036** | **~0.45 s** | **16×** |\n\n\u003e RTF (Real-Time Factor) \u003c 1.0 means faster than real-time.\n\u003e With Numba + pyFFTW the Python implementation runs at **2.6× the C++ estimated speed** (C++ RTF ≈ 0.093).\n\nStage-level breakdown of the v3.6.0 fully-accelerated path:\n\n| Stage | Time | % |\n|---|---|---|\n| Gammatone filterbank | 0.179 s | 40% |\n| DP Patch matching (fused NSIM kernel) | 0.131 s | 29% |\n| Global alignment (pyFFTW rfft/irfft) | 0.091 s | 20% |\n| Fine alignment + NSIM | 0.043 s | 10% |\n| Other (SPL, postproc, SVR, …) | 0.003 s | \u003c 1% |\n\n## Project Structure\n\n```\nvisqol-python/\n├── visqol/                    # Main package\n│   ├── __init__.py            # Package exports \u0026 version\n│   ├── api.py                 # Public API (VisqolApi)\n│   ├── visqol_manager.py      # Pipeline orchestrator\n│   ├── visqol_core.py         # Core algorithm\n│   ├── audio_utils.py         # Audio I/O \u0026 SPL normalization\n│   ├── signal_utils.py        # Envelope, cross-correlation\n│   ├── analysis_window.py     # Hann window\n│   ├── gammatone.py           # ERB + Gammatone filterbank + spectrogram\n│   ├── patch_creator.py       # Patch creation (Image + VAD modes)\n│   ├── patch_selector.py      # DP-based optimal patch matching\n│   ├── alignment.py           # Global alignment via cross-correlation\n│   ├── nsim.py                # NSIM similarity metric\n│   ├── quality_mapper.py      # SVR \u0026 exponential quality mapping\n│   ├── numba_accel.py         # Optional Numba JIT kernels (DP, NSIM, Gammatone)\n│   ├── __main__.py            # CLI entry point\n│   ├── py.typed               # PEP 561 type marker\n│   └── model/                 # Bundled SVR model\n│       └── libsvm_nu_svr_model.txt\n├── tests/                     # Tests \u0026 benchmarks (pytest)\n│   ├── conftest.py            # Shared fixtures \u0026 CLI options\n│   ├── test_quick.py          # Smoke tests (no external data needed)\n│   ├── test_conformance.py    # Full conformance tests (needs testdata)\n│   ├── test_parallel_correctness.py  # Numba parallel correctness tests\n│   └── bench_*.py             # Performance benchmarks\n├── .github/workflows/\n│   ├── ci.yml                 # CI: lint + type-check + matrix test (Python × NumPy)\n│   └── publish.yml            # Auto-publish to PyPI on tag push\n├── pyproject.toml             # Package metadata \u0026 build config\n├── CHANGELOG.md\n├── CONTRIBUTING.md\n├── LICENSE\n└── README.md\n```\n\n## Conformance Test Results\n\nTested against the [official C++ ViSQOL v3.3.3](https://github.com/google/visqol) expected values:\n\n| Test Case | Mode | Expected MOS | Python MOS | Δ |\n|-----------|------|-------------|------------|---|\n| strauss_lp35 | Audio | 1.3889 | 1.3889 | 0.000000 |\n| steely_lp7 | Audio | 2.2502 | 2.2502 | 0.000000 |\n| sopr_256aac | Audio | 4.6823 | 4.6823 | 0.000000 |\n| ravel_128opus | Audio | 4.4651 | 4.4651 | 0.000000 |\n| moonlight_128aac | Audio | 4.6843 | 4.6843 | 0.000000 |\n| harpsichord_96mp3 | Audio | 4.2237 | 4.2237 | 0.000000 |\n| guitar_64aac | Audio | 4.3497 | 4.3497 | 0.000000 |\n| glock_48aac | Audio | 4.3325 | 4.3325 | 0.000000 |\n| contrabassoon_24aac | Audio | 2.3469 | 2.3468 | 0.000117 |\n| castanets_identity | Audio | 4.7321 | 4.7321 | 0.000000 |\n| speech_CA01 (polynomial) | Speech | 3.3745 | 3.3756 | 0.001057 |\n| speech_CA01 (lattice) | Speech | 3.3130 | 3.3153 | 0.002341 |\n\nBoth speech values come from running the C++ ViSQOL binary directly with the corresponding `--use_lattice_model` flag, so they represent ground-truth parity targets.\n\n## References\n\n- [Google ViSQOL (C++)](https://github.com/google/visqol) — the original implementation this project is ported from\n- Hines, A., Gillen, E., Kelly, D., Skoglund, J., Kokaram, A., \u0026 Harte, N. (2015). *ViSQOLAudio: An Objective Audio Quality Metric for Low Bitrate Codecs.* The Journal of the Acoustical Society of America.\n- Chinen, M., Lim, F. S., Skoglund, J., Gureev, N., O'Gorman, F., \u0026 Hines, A. (2020). *ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric.* 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX).\n\n## License\n\nApache License 2.0. See [LICENSE](LICENSE) for details.\n\nThis project is a Python port of [Google's ViSQOL](https://github.com/google/visqol), which is also licensed under Apache 2.0.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftalker93%2Fvisqol-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftalker93%2Fvisqol-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftalker93%2Fvisqol-python/lists"}