https://github.com/audiojs/pitch-detection
Pitch detection algorithms
https://github.com/audiojs/pitch-detection
Last synced: about 1 month ago
JSON representation
Pitch detection algorithms
- Host: GitHub
- URL: https://github.com/audiojs/pitch-detection
- Owner: audiojs
- License: other
- Created: 2026-04-16T04:20:40.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-04-17T17:27:58.000Z (about 1 month ago)
- Last Synced: 2026-04-17T19:28:15.223Z (about 1 month ago)
- Language: JavaScript
- Size: 32.2 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
README
# pitch-detection [](https://github.com/audiojs/pitch-detection/actions/workflows/test.yml) [](https://www.npmjs.com/package/pitch-detection) [](https://github.com/krishnized/license)
Pitch, chroma, chord and key detection. YIN, McLeod, pYIN, HPS, cepstrum, SWIPE, autocorrelation, AMDF, NNLS chroma, chord templates, Krumhansl-Schmuckler.
### Pitch
**[YIN](#yin)** — cumulative mean normalized difference
**[McLeod](#mcleod)** — normalized square difference (MPM)
**[pYIN](#pyin)** — probabilistic YIN with Beta prior
**[Autocorrelation](#autocorrelation)** — normalized autocorrelation
**[AMDF](#amdf)** — average magnitude difference
### Spectral pitch
**[HPS](#hps)** — harmonic product spectrum
**[Cepstrum](#cepstrum)** — real cepstrum peak picking
**[SWIPE](#swipe)** — sawtooth waveform inspired estimator
### Harmony
**[Chroma](#chroma)** — PCP / NNLS pitch-class profiles
**[Chord](#chord)** — template matching + Viterbi smoothing
**[Key](#key)** — Krumhansl-Schmuckler key finding
## Install
```
npm install pitch-detection
```
## Usage
```js
import { yin, mcleod, chroma, chord, key } from 'pitch-detection'
let fs = 44100
let frame = new Float32Array(2048) // fill from your audio source
// pitch
let result = yin(frame, { fs })
// → { freq: 440.1, clarity: 0.97 } or null
// chroma → chord → key
let c = chroma(frame, { fs, method: 'nnls' })
let ch = chord(c)
// → { root: 0, quality: 'maj', label: 'C', confidence: 0.92 }
let k = key(c)
// → { tonic: 0, mode: 'major', label: 'C', confidence: 0.85, scores: [...] }
```
> Works in Node.js and browser. No Web Audio API needed — operates on raw `Float32Array` samples.
**Sliding windows** — call repeatedly as new samples arrive:
```js
let hop = 512
for (let i = 0; i + 2048 <= samples.length; i += hop) {
let frame = samples.subarray(i, i + 2048)
let result = yin(frame, { fs })
if (result) console.log(i / fs, result.freq.toFixed(1))
}
```
**Full pipeline** — pitch → chroma → chord → key on a sequence of frames:
```js
import { chroma, chord, smoothChords, key } from 'pitch-detection'
let frames = []
for (let i = 0; i + 4096 <= samples.length; i += 2048) {
frames.push(chroma(samples.subarray(i, i + 4096), { fs, method: 'nnls' }))
}
let chords = smoothChords(frames, { selfProb: 0.5 })
// → [{ root: 0, quality: 'maj', label: 'C' }, ...]
let k = key(frames)
// → { tonic: 0, mode: 'major', label: 'C', confidence: 0.85, scores: [...] }
```
## API
All pitch algorithms return `{ freq, clarity } | null`:
- `freq` — fundamental frequency in Hz
- `clarity` — algorithm-specific confidence in `[0, 1]`
- `null` — no periodic structure found (silence, noise, polyphony)
Time-domain algorithms (YIN, McLeod, pYIN, autocorrelation, AMDF) accept any buffer length. Spectral algorithms (HPS, cepstrum, SWIPE, chroma) require power-of-2 length.
---
## YIN
**de Cheveigné & Kawahara, 2002.** The reference algorithm for monophonic pitch estimation. Most cited, most tested, most robust.
```js
import yin from 'pitch-detection/yin.js'
let result = yin(samples, { fs: 44100 })
```
| Param | Default | |
|---|---|---|
| `fs` | `44100` | Sample rate (Hz) |
| `threshold` | `0.15` | CMND threshold — lower = stricter, fewer detections |
**Use when:** General-purpose monophonic pitch tracking — speech, singing, solo instruments. The most reliable choice when in doubt.
**Not for:** Polyphonic audio (returns dominant or null), real-time with hard latency budgets (needs full window).
**Ref:** de Cheveigné & Kawahara, ["YIN, a fundamental frequency estimator for speech and music"](https://doi.org/10.1121/1.1458024), JASA 2002.
**Complexity:** $O(N^2/4)$ — two nested passes over half the window.
## McLeod
**McLeod & Wyvill, 2005.** Normalized square difference with smarter peak picking. Handles smaller windows — good for vibrato and fast pitch changes.
```js
import mcleod from 'pitch-detection/mcleod.js'
let result = mcleod(samples, { fs: 44100 })
```
| Param | Default | |
|---|---|---|
| `fs` | `44100` | Sample rate (Hz) |
| `threshold` | `0.9` | Peak selection threshold as fraction of global max |
**Use when:** Vibrato tracking, small hop sizes, singing voice where YIN occasionally double-triggers.
**Not for:** Highly noisy signals (NSDF is less thresholded than YIN's CMND).
**Ref:** McLeod & Wyvill, ["A smarter way to find pitch"](https://www.cs.otago.ac.nz/research/publications/oucs-2008-03.pdf), ICMC 2005.
**Complexity:** $O(N^2/4)$ — same asymptotic cost as YIN.
## pYIN
**Mauch & Dixon, 2014.** Probabilistic YIN — runs YIN at multiple thresholds weighted by a Beta(2, 18) prior, producing a distribution over candidate pitches instead of a single hard pick. More robust than YIN on ambiguous frames.
```js
import pyin from 'pitch-detection/pyin.js'
let result = pyin(samples, { fs: 44100 })
// → { freq: 440.1, clarity: 0.92, candidates: [{ freq: 440.1, prob: 0.85 }, ...] }
```
| Param | Default | |
|---|---|---|
| `fs` | `44100` | Sample rate (Hz) |
| `minFreq` | `50` | Minimum detectable frequency (Hz) |
| `maxFreq` | `2000` | Maximum detectable frequency (Hz) |
**Use when:** Ambiguous pitched content — breathy vocals, noisy recordings, or when you need a pitch posterior for downstream HMM tracking.
**Not for:** Clean signals where YIN already works well (pYIN is ~10× slower due to multi-threshold sweep).
**Ref:** Mauch & Dixon, ["pYIN: A Fundamental Frequency Estimator Using Probabilistic Threshold Distributions"](https://doi.org/10.1109/ICASSP.2014.6853678), ICASSP 2014.
## Autocorrelation
Normalized autocorrelation — the simplest pitch estimator. Educational baseline.
```js
import autocorrelation from 'pitch-detection/autocorrelation.js'
let result = autocorrelation(samples, { fs: 44100 })
```
| Param | Default | |
|---|---|---|
| `fs` | `44100` | Sample rate (Hz) |
| `threshold` | `0.5` | Minimum normalized autocorrelation value to accept |
**Use when:** Learning, quick prototypes, signals with strong dominant periodicity and low noise.
**Not for:** Production — octave errors are common without additional heuristics.
**Ref:** Rabiner, ["Use of autocorrelation analysis for pitch detection"](https://doi.org/10.1109/TASSP.1977.1162905), IEEE TASSP 1977.
**Complexity:** $O(N^2/4)$.
## AMDF
**Ross et al., 1974.** Average Magnitude Difference Function — the classical predecessor to YIN. Measures average absolute difference between a signal and its delayed copy; minima indicate periodicity.
```js
import amdf from 'pitch-detection/amdf.js'
let result = amdf(samples, { fs: 44100 })
```
| Param | Default | |
|---|---|---|
| `fs` | `44100` | Sample rate (Hz) |
| `minFreq` | `50` | Minimum detectable frequency (Hz) |
| `maxFreq` | `2000` | Maximum detectable frequency (Hz) |
| `threshold` | `0.3` | Normalized AMDF dip threshold |
**Use when:** Low-complexity environments, embedded systems. Simpler and cheaper than YIN (no squaring, no cumulative normalization).
**Not for:** Noisy signals — lacks YIN's cumulative normalization that suppresses octave errors.
**Ref:** Ross et al., ["Average magnitude difference function pitch extractor"](https://doi.org/10.1109/TASSP.1974.1162598), IEEE TASSP 1974.
**Complexity:** $O(N^2/4)$.
---
## HPS
**Schroeder, 1968.** Harmonic Product Spectrum — multiplies the spectrum by its downsampled copies so that harmonic peaks align at the fundamental. Robust to the missing-fundamental problem.
```js
import hps from 'pitch-detection/hps.js'
let result = hps(samples, { fs: 44100 })
```
| Param | Default | |
|---|---|---|
| `fs` | `44100` | Sample rate (Hz) |
| `harmonics` | `5` | Number of harmonic products |
| `minFreq` | `50` | Minimum detectable frequency (Hz) |
| `maxFreq` | `4000` | Maximum detectable frequency (Hz) |
| `cents` | `10` | Candidate spacing in cents |
| `threshold` | `0.1` | Minimum clarity to accept |
**Use when:** Harmonic-rich signals (guitar, piano, brass). Naturally handles missing fundamentals.
**Not for:** Pure sinusoids (only one harmonic), very noisy signals.
**Ref:** Schroeder, ["Period histogram and product spectrum"](https://doi.org/10.1121/1.1910902), JASA 1968.
**Requires:** Power-of-2 window length.
## Cepstrum
**Noll, 1967.** Real cepstrum — $c(\tau) = \text{IFFT}(\log |\text{FFT}(x)|)$. A peak at quefrency $\tau$ corresponds to period $\tau$ in the time domain.
```js
import cepstrum from 'pitch-detection/cepstrum.js'
let result = cepstrum(samples, { fs: 44100 })
```
| Param | Default | |
|---|---|---|
| `fs` | `44100` | Sample rate (Hz) |
| `minFreq` | `50` | Minimum detectable frequency (Hz) |
| `maxFreq` | `2000` | Maximum detectable frequency (Hz) |
| `threshold` | `0.3` | Minimum clarity to accept |
**Use when:** Harmonic signals where you want a clean spectral-domain method. Good pedagogical complement to time-domain algorithms.
**Not for:** Low-pitched signals (quefrency resolution is limited by window length).
**Ref:** Noll, ["Cepstrum pitch determination"](https://doi.org/10.1121/1.1911537), JASA 1967.
**Requires:** Power-of-2 window length.
## SWIPE
**Camacho & Harris, 2008.** SWIPE' (Sawtooth Waveform Inspired Pitch Estimator, prime harmonics). Measures spectral similarity between the window and a sawtooth template whose lobes sit at prime harmonics. More accurate than HPS on clean instrumental signals; robust against octave errors because only prime harmonics contribute.
Simplified single-window form: uses one FFT instead of the multi-resolution loudness pyramid of the original paper — sufficient for stationary windows.
```js
import swipe from 'pitch-detection/swipe.js'
let result = swipe(samples, { fs: 44100 })
```
| Param | Default | |
|---|---|---|
| `fs` | `44100` | Sample rate (Hz) |
| `minFreq` | `60` | Minimum detectable frequency (Hz) |
| `maxFreq` | `4000` | Maximum detectable frequency (Hz) |
| `cents` | `10` | Candidate spacing in cents |
| `threshold` | `0.15` | Minimum clarity to accept |
**Use when:** Clean instrumental signals, studio recordings, where sub-Hz accuracy matters.
**Not for:** Very noisy or reverberant signals (single-window form lacks multi-resolution robustness of the full SWIPE').
**Ref:** Camacho & Harris, ["A sawtooth waveform inspired pitch estimator for speech and music"](https://doi.org/10.1121/1.2951592), JASA 2008.
**Requires:** Power-of-2 window length.
---
## Chroma
**Fujishima, 1999 (PCP) / Mauch & Dixon, 2010 (NNLS).** Chroma feature — a 12-D vector where each bin holds the energy attributed to one pitch class (C, C#, ..., B).
```js
import chroma from 'pitch-detection/chroma.js'
// PCP — classical spectral folding
let c = chroma(samples, { fs: 44100 })
// NNLS — nonnegative least squares (cleaner for polyphonic audio)
let c2 = chroma(samples, { fs: 44100, method: 'nnls' })
```
### PCP (default)
Each spectral bin is mapped to its nearest pitch class and squared magnitudes are accumulated. Simple and fast.
### NNLS
Fits the observed $\sqrt{\text{spectrum}}$ as a nonnegative combination of synthetic pitch-tone profiles (fundamental plus geometrically decaying overtones, Gaussian lobes in log-frequency with σ = 0.5 semitones). Uses multiplicative NMF updates: $a \leftarrow a \cdot (D^\top s) / (D^\top D a + \varepsilon)$. Suppresses octave and harmonic confusion on polyphonic audio.
Pitch dictionary covers MIDI 24–96 (C1–C7) with configurable harmonics per tone.
| Param | Default | |
|---|---|---|
| `fs` | `44100` | Sample rate (Hz) |
| `method` | `'pcp'` | `'pcp'` or `'nnls'` |
| `minFreq` | `65` | Min frequency for PCP mapping (~C2) |
| `maxFreq` | `2093` | Max frequency for PCP mapping (~C7) |
| `harmonics` | `8` | Overtones per pitch (NNLS only) |
| `iterations` | `30` | NMF iterations (NNLS only) |
**Returns:** `Float64Array(12)`, L1-normalized.
**Use when:** Building chord/key detectors, music information retrieval, audio fingerprinting. NNLS for polyphonic; PCP for speed.
**Ref (PCP):** Fujishima, ["Realtime chord recognition of musical sound"](https://doi.org/10.1109/ICMC.1999.318003), ICMC 1999.
**Ref (NNLS):** Mauch & Dixon, ["Approximate Note Transcription for the Improved Identification of Difficult Chords"](https://doi.org/10.5281/zenodo.1416026), ISMIR 2010.
**Requires:** Power-of-2 window length.
## Chord
**Fujishima, 1999 (templates) / Viterbi smoothing.** Classifies chroma frames as one of 24 major/minor triads via cosine similarity with binary templates.
```js
import chord, { TEMPLATES, smooth as smoothChords } from 'pitch-detection/chord.js'
// single frame
let c = chord(chromaVec)
// → { root: 0, quality: 'maj', label: 'C', confidence: 0.92 }
// smoothed sequence
let chords = smoothChords(chromaFrames, { selfProb: 0.5 })
// → [{ root: 0, quality: 'maj', label: 'C' }, ...]
```
### `chord(chromaVec, opts)`
Cosine similarity against 24 binary templates (12 major + 12 minor triads). Returns the best match with confidence score.
| Param | Default | |
|---|---|---|
| `minConfidence` | `0.3` | Below this, returns quality `'N'` (no chord) |
**Returns:** `{ root, quality, label, confidence }` where quality is `'maj'`, `'min'`, or `'N'`.
### `smooth(frames, opts)`
Viterbi decoding with a sticky self-transition prior. Observation log-likelihood = 8 × cosine similarity (temperature 8 gives reasonably sharp distributions).
| Param | Default | |
|---|---|---|
| `selfProb` | `0.5` | Self-transition probability (higher = smoother) |
**Returns:** `{ root, quality, label }[]` — one chord per frame.
### `TEMPLATES`
Exported array of 24 chord templates: `{ root, quality, label, vec }` where `vec` is a `Float64Array(12)` with 1 on chord tones.
**Use when:** Quick chord labeling from chroma features. Combine with NNLS chroma for best results.
**Ref:** Fujishima, ["Realtime chord recognition of musical sound"](https://doi.org/10.1109/ICMC.1999.318003), ICMC 1999.
## Key
**Krumhansl & Schmuckler.** Detects musical key from chroma via Pearson correlation against 24 rotated major/minor key profiles (Krumhansl-Kessler probe-tone ratings).
```js
import key, { KK_MAJOR, KK_MINOR } from 'pitch-detection/key.js'
let k = key(chromaVec)
// → { tonic: 0, mode: 'major', label: 'C', confidence: 0.85, scores: [...] }
// from multiple frames (averages internally)
let k2 = key(chromaFrames)
```
| Param | Default | |
|---|---|---|
| `profile` | `{ major: KK_MAJOR, minor: KK_MINOR }` | Custom key profiles |
**Returns:** `{ tonic, mode, label, confidence, scores }` where `scores` is all 24 keys sorted descending.
### Exported profiles
- `KK_MAJOR` — Krumhansl-Kessler major profile: `[6.35, 2.23, 3.48, 2.33, 4.38, 4.09, 2.52, 5.19, 2.39, 3.66, 2.29, 2.88]`
- `KK_MINOR` — Krumhansl-Kessler minor profile: `[6.33, 2.68, 3.52, 5.38, 2.60, 3.53, 2.54, 4.75, 3.98, 2.69, 3.34, 3.17]`
**Use when:** Key detection for music analysis, automatic transposition, music information retrieval.
**Ref:** Krumhansl, [*Cognitive Foundations of Musical Pitch*](https://doi.org/10.1093/acprof:oso/9780195148367.001.0001), Oxford 1990.
**Ref:** Temperley, ["What's Key for Key?"](https://doi.org/10.1525/mp.1999.17.1.65), Music Perception 1999.
---
## Comparison
### Pitch algorithms
| | YIN | McLeod | pYIN | AMDF | HPS | Cepstrum | SWIPE |
|---|---|---|---|---|---|---|---|
| **Domain** | time | time | time | time | spectral | spectral | spectral |
| **Accuracy** | ★★★★★ | ★★★★ | ★★★★★ | ★★★ | ★★★★ | ★★★ | ★★★★★ |
| **Noise robustness** | ★★★★★ | ★★★★ | ★★★★★ | ★★★ | ★★★ | ★★★ | ★★★★ |
| **Octave errors** | rare | rare | rare | common | rare | occasional | rare |
| **Missing fundamental** | no | no | no | no | yes | yes | yes |
| **Min window** | ~4 periods | ~2 periods | ~4 periods | ~4 periods | power of 2 | power of 2 | power of 2 |
| **Best for** | general | vibrato | ambiguous | embedded | harmonic-rich | pedagogical | studio |
## See also
- [fourier-transform](https://github.com/scijs/fourier-transform) — FFT used by spectral algorithms
- [beat-detection](https://github.com/audiojs/beat-detection) — onset detection, tempo estimation, beat tracking
- [digital-filter](https://github.com/audiojs/digital-filter) — filter design and processing
- [time-stretch](https://github.com/audiojs/time-stretch) — time stretching and pitch shifting
- [pitch-shift](https://github.com/audiojs/pitch-shift) — pitch shifting algorithms