An open API service indexing awesome lists of open source software.

https://github.com/finite-sample/analytic-bw

Analytic Bandwidth Selector for NW and KDE Based on CV Hessian
https://github.com/finite-sample/analytic-bw

Last synced: 8 months ago
JSON representation

Analytic Bandwidth Selector for NW and KDE Based on CV Hessian

Awesome Lists containing this project

README

          

# Analytic‑Hessian Bandwidth Selection

A *single* Newton step can pick the optimal kernel bandwidth if you hand it the right derivatives. We derive **closed‑form gradients *and* Hessians** of the leave‑one‑out cross‑validation (LOOCV) risk for

* univariate **kernel‑density estimation** (KDE), and
* univariate **Nadaraya–Watson** (NW) kernel regression,

covering both the Gaussian and Epanechnikov kernels. The result is a bandwidth selector that reaches the same minimum as an exhaustive grid scan while using *an order of magnitude* fewer evaluations.

---

## 1  Problem & Prior Practice

### 1.1  KDE bandwidth

For density estimation one minimises finite‑sample risk proxies such as

* **LSCV** (least‑squares CV)
* **LCV** (likelihood CV)

Common optimisers

| Optimiser | Typical calls | Notes |
| --------------------------------------- | ------------- | --------------------- |
| Grid (50–100 \$h\$’s) | 50–100 | textbook default |
| Golden‑section | 20–25 | still bracketing |
| Plug‑in / Pilot (Sheather–Jones, Botev) | 1 | relies on asymptotics |

### 1.2  NW bandwidth

For regression one often minimises the LOOCV mean‑squared‑error (MSE) surface. Again, the standard choice is a grid over 40–60 bandwidths.

> **Gap** All prior work optimises by *searching* the CV surface. Very little exploits its analytic structure beyond a first derivative.

---

## 2  Newton–Armijo with Analytic Hessian

Let \$L(h)\$ denote the CV score (LSCV for KDE, LOOCV‑MSE for NW). In log‑bandwidth space \$u=\log h\$ we compute
$g(u)=\frac{dL}{du},\qquad H(u)=\frac{d^2L}{du^2}.$
With those we run

```pseudo
repeat until |Δu| < 1e‑6 or max_iter:
step ← −g/H # Newton direction
u ← Armijo(u,step) # back‑track to guarantee descent
```

* **Analytic derivatives** for both kernels avoid numerical differencing.
* **Armijo line search** keeps stability when \$n\$ is tiny (non‑convex wiggles).
* **Cost** = one score evaluation per back‑track (6–12 total).

Closed‑form expressions are given in `derivatives.py` – two lines each.

---

## 3  Simulation Design

| Component | KDE | Nadaraya–Watson |
| -------------------- | --------------------------------------------------------- | -------------------------------------------------------------------------------------------- |
| True function | 50‑50 mix of \$\mathcal N(-2,0.5)\$ & \$\mathcal N(2,1)\$ | \$y=f(x)+\varepsilon\$, same \$f\$ as mixture CDF, \$\varepsilon\sim\mathcal N(0,\sigma^2)\$ |
| Sample sizes | \$n\in{100,200,500}\$ | same |
| Noise std \$\sigma\$ | \${0.5,1,2}\$ | same |
| Kernels | Gaussian & Epanechnikov | Gaussian & Epanechnikov |
| Replicates | \$R=20\$ | \$R=20\$ |
| Risk metric | ISE on \[\$-8,8\$] | test‑set MSE (10 000 pts) |
| Methods | Grid, Golden, Newton–Armijo, Silverman | Grid, Golden, Newton–Armijo, Plug‑in |

Scripts: `kde_analytic_hessian.py`, `nw_analytic_hessian.py`.

---

## 4  Results Summary

See notebooks for results

---

## 5  Take‑aways

* **Same optimum, fewer calls.** Newton reaches the exact grid minimum for both problems with 4–12× fewer evaluations.
* **Kernel generality.** The derivation is only two lines per kernel; extending to other polynomial kernels is trivial.
* **Small‑sample stability.** Armijo back‑tracking prevents the overshoot hiccups often seen with Epanechnikov near tiny \$h\$.

---

## 6  Related Work & Novelty

| Reference | Setting | Optimiser | Criterion | Notes |
| ------------------------------ | ---------- | ----------------- | -------------------- | ------------------------------------ |
| Loader 1999; Wand & Jones 1995 | KDE | Grid / golden | LSCV | Textbook standard |
| Chiu 1992 | KDE | Newton | **GCV** only | No Hessian; Gaussian kernel only |
| Fan & Gijbels 1995 | Local poly | Iterative plug‑in | Asymptotic MISE | Different objective |
| Härdle (LOLVC) | NW | Grid | LOOCV | Widely cited |
| **This work** | KDE & NW | **Newton–Armijo** | **Exact LOOCV/LSCV** | First analytic Hessian; 4–12× faster |