https://github.com/finite-sample/analytic-bw
Analytic Bandwidth Selector for NW and KDE Based on CV Hessian
https://github.com/finite-sample/analytic-bw
Last synced: 8 months ago
JSON representation
Analytic Bandwidth Selector for NW and KDE Based on CV Hessian
- Host: GitHub
- URL: https://github.com/finite-sample/analytic-bw
- Owner: finite-sample
- Created: 2025-07-05T21:50:59.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-08-17T19:53:23.000Z (10 months ago)
- Last Synced: 2025-09-26T23:38:27.912Z (9 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 58.6 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Analytic‑Hessian Bandwidth Selection
A *single* Newton step can pick the optimal kernel bandwidth if you hand it the right derivatives. We derive **closed‑form gradients *and* Hessians** of the leave‑one‑out cross‑validation (LOOCV) risk for
* univariate **kernel‑density estimation** (KDE), and
* univariate **Nadaraya–Watson** (NW) kernel regression,
covering both the Gaussian and Epanechnikov kernels. The result is a bandwidth selector that reaches the same minimum as an exhaustive grid scan while using *an order of magnitude* fewer evaluations.
---
## 1 Problem & Prior Practice
### 1.1 KDE bandwidth
For density estimation one minimises finite‑sample risk proxies such as
* **LSCV** (least‑squares CV)
* **LCV** (likelihood CV)
Common optimisers
| Optimiser | Typical calls | Notes |
| --------------------------------------- | ------------- | --------------------- |
| Grid (50–100 \$h\$’s) | 50–100 | textbook default |
| Golden‑section | 20–25 | still bracketing |
| Plug‑in / Pilot (Sheather–Jones, Botev) | 1 | relies on asymptotics |
### 1.2 NW bandwidth
For regression one often minimises the LOOCV mean‑squared‑error (MSE) surface. Again, the standard choice is a grid over 40–60 bandwidths.
> **Gap** All prior work optimises by *searching* the CV surface. Very little exploits its analytic structure beyond a first derivative.
---
## 2 Newton–Armijo with Analytic Hessian
Let \$L(h)\$ denote the CV score (LSCV for KDE, LOOCV‑MSE for NW). In log‑bandwidth space \$u=\log h\$ we compute
$g(u)=\frac{dL}{du},\qquad H(u)=\frac{d^2L}{du^2}.$
With those we run
```pseudo
repeat until |Δu| < 1e‑6 or max_iter:
step ← −g/H # Newton direction
u ← Armijo(u,step) # back‑track to guarantee descent
```
* **Analytic derivatives** for both kernels avoid numerical differencing.
* **Armijo line search** keeps stability when \$n\$ is tiny (non‑convex wiggles).
* **Cost** = one score evaluation per back‑track (6–12 total).
Closed‑form expressions are given in `derivatives.py` – two lines each.
---
## 3 Simulation Design
| Component | KDE | Nadaraya–Watson |
| -------------------- | --------------------------------------------------------- | -------------------------------------------------------------------------------------------- |
| True function | 50‑50 mix of \$\mathcal N(-2,0.5)\$ & \$\mathcal N(2,1)\$ | \$y=f(x)+\varepsilon\$, same \$f\$ as mixture CDF, \$\varepsilon\sim\mathcal N(0,\sigma^2)\$ |
| Sample sizes | \$n\in{100,200,500}\$ | same |
| Noise std \$\sigma\$ | \${0.5,1,2}\$ | same |
| Kernels | Gaussian & Epanechnikov | Gaussian & Epanechnikov |
| Replicates | \$R=20\$ | \$R=20\$ |
| Risk metric | ISE on \[\$-8,8\$] | test‑set MSE (10 000 pts) |
| Methods | Grid, Golden, Newton–Armijo, Silverman | Grid, Golden, Newton–Armijo, Plug‑in |
Scripts: `kde_analytic_hessian.py`, `nw_analytic_hessian.py`.
---
## 4 Results Summary
See notebooks for results
---
## 5 Take‑aways
* **Same optimum, fewer calls.** Newton reaches the exact grid minimum for both problems with 4–12× fewer evaluations.
* **Kernel generality.** The derivation is only two lines per kernel; extending to other polynomial kernels is trivial.
* **Small‑sample stability.** Armijo back‑tracking prevents the overshoot hiccups often seen with Epanechnikov near tiny \$h\$.
---
## 6 Related Work & Novelty
| Reference | Setting | Optimiser | Criterion | Notes |
| ------------------------------ | ---------- | ----------------- | -------------------- | ------------------------------------ |
| Loader 1999; Wand & Jones 1995 | KDE | Grid / golden | LSCV | Textbook standard |
| Chiu 1992 | KDE | Newton | **GCV** only | No Hessian; Gaussian kernel only |
| Fan & Gijbels 1995 | Local poly | Iterative plug‑in | Asymptotic MISE | Different objective |
| Härdle (LOLVC) | NW | Grid | LOOCV | Widely cited |
| **This work** | KDE & NW | **Newton–Armijo** | **Exact LOOCV/LSCV** | First analytic Hessian; 4–12× faster |