https://github.com/aditya1707/forcekernel-eabf
Force-kernel eABF: a PLUMED enhanced-sampling plugin that delivers smooth mean-force estimates and free-energy landscapes from the earliest stages of sampling.
https://github.com/aditya1707/forcekernel-eabf
abf aimd biophysics collective-variables computational-chemistry eabf enhanced-sampling free-energy-calculations kernel-methods metadynamics molecular-dynamics nadaraya-watson-regression opes plumed rare-events
Last synced: 4 days ago
JSON representation
Force-kernel eABF: a PLUMED enhanced-sampling plugin that delivers smooth mean-force estimates and free-energy landscapes from the earliest stages of sampling.
- Host: GitHub
- URL: https://github.com/aditya1707/forcekernel-eabf
- Owner: aditya1707
- License: gpl-3.0
- Created: 2026-03-09T17:21:24.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-06-22T22:56:09.000Z (13 days ago)
- Last Synced: 2026-06-23T00:22:59.070Z (13 days ago)
- Topics: abf, aimd, biophysics, collective-variables, computational-chemistry, eabf, enhanced-sampling, free-energy-calculations, kernel-methods, metadynamics, molecular-dynamics, nadaraya-watson-regression, opes, plumed, rare-events
- Language: C++
- Homepage:
- Size: 3.78 MB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# FK-eABF
A toolkit for running FK-eABF (Force-Kernel eABF) enhanced sampling simulations in PLUMED and recovering free energy landscapes from the results.
FK-eABF is an adaptive biasing force method that uses an extended Lagrangian (fictitious particle λ coupled to the real collective variable z) and a kernel-based mean force estimator. The CZAR estimator on the real CV z provides an unbiased free energy gradient that is integrated into a free energy landscape in post-processing.
---
## Citation
If you use FK-eABF in your work, please cite:
> Kang, C.; Verma, R.; Sonpal, A.; Shoji, A.; Chipot, C.; Pfaendtner, J. *A Force-Kernel Reformulation of the Extended-System Adaptive Biasing Force for Free-Energy Calculations*. *J. Chem. Theory Comput.*, **submitted (2026)**.
> DOI: *to be added upon acceptance*.
---
## Requirements
- PLUMED (with `forcekernel.cpp` compiled as a plugin via `LOAD`)
- A MD engine supported by PLUMED, or the built-in `pesmd` toy integrator for 2D potentials
- Python 3 with NumPy and SciPy for post-processing
---
## Workflow
### 1. Setting up an FK-eABF simulation
The plugin is loaded at runtime via the PLUMED `LOAD` directive — no recompilation of PLUMED is required. Configure your `plumed.dat` to load the plugin and define the `FKERNELABF` action:
```plumed
LOAD FILE=./forcekernel.cpp
cv:
fk: FKERNELABF ...
# CV Definition
ARG=cv
# Extended Lagrangian Options
KAPPA=3000.0 # coupling spring constant (kJ/mol/nm^2)
TAU=0.5 # fictitious particle time constant (ps)
FRICTION=8.0 # Langevin friction (ps^-1)
TEMP=300 # temperature (K)
# FK-eABF Options
GRIDMIN=-1.5 # CV domain lower bound
GRIDMAX=1.5 # CV domain upper bound
SIGMA=0.05 # initial kernel width
SIGMA_MIN=0.01 # minimum kernel width
GRIDSIZE=100
# Data Accumulation and Biasing Force Update options
PACE=5 # steps between data accumulation
GRIDPACE=1000 # steps between biasing force updates
# Output options
CZARSTRIDE=50000 # steps between CZAR kernel file writes
KERNELINFOSTRIDE=500 # match this to your PRINT STRIDE
...
PRINT FILE=COLVAR STRIDE=500 ARG=*
```
#### Parameter selection at a glance
If you're setting up a new system, the following decision tree walks through the main parameter choices in roughly the order they should be made:
```mermaid
%%{init: {'flowchart': {'nodeSpacing': 25, 'rankSpacing': 35}}}%%
flowchart TD
Start([Configure FKERNELABF]) --> CV{Periodic CV?}
CV -->|No| CVN[Set GRIDMIN, GRIDMAX]
CV -->|Yes| Sigma
CVN --> Sigma{Bin width
known?}
Sigma -->|Yes| SS[SIGMA = bin width
SIGMA_MIN ≈ SIGMA / 2]
Sigma -->|No| SA[Omit SIGMA, set SIGMA_MIN
adaptive σ₀ warmup]
SS --> Kappa["KAPPA so √(kT/κ) ≪ SIGMA_MIN
typically 1000–5000"]
SA --> Kappa
Kappa --> Engine{Engine?}
Engine -->|Classical| MD[GRIDPACE 500–1000]
Engine -->|AIMD| AIMD[PACE = 1, GRIDPACE 50–200]
MD --> Bias[BIASFACTOR 2–10]
AIMD --> Bias
Bias --> Out[CZARSTRIDE as needed
KERNELINFOSTRIDE = PRINT STRIDE]
Out --> End([Run])
```
**TLDR; set `SIGMA`, `SIGMA_MIN`, and `GRIDSIZE` (plus `GRIDMIN`/`GRIDMAX` for non-periodic CVs). Everything else can be left at its default.** The three parameters above are technically optional but should be treated as mandatory in practice — getting them right is the difference between a simulation that converges efficiently and one that wastes compute time.
**SIGMA and SIGMA_MIN.** `SIGMA` is the initial kernel bandwidth; `SIGMA_MIN` is the floor below which adaptive Silverman rescaling cannot shrink it. Omitting `SIGMA_MIN` lets the kernel population grow without bound as the bandwidth contracts, wasting memory and slowing the kernel search. Use the bin width you would adopt for eABF as a guide: set `SIGMA` to that value and `SIGMA_MIN` to half of it (e.g., a 5° dihedral bin → `SIGMA` ≈ 0.087 rad, `SIGMA_MIN` ≈ 0.04 rad).
**GRIDSIZE.** The mean-force grid is where NW regression is evaluated and multilinearly interpolated between rebuilds. The grid resolution does not affect kernel accumulation or the recovered free energy — it controls how faithfully the cancellation force is applied between updates. By default (`GRIDSIZE=0`), FK-eABF auto-sizes the grid so that spacing equals `2 × SIGMA_MIN` (the effective kernel diameter), with a floor of 72 points per dimension. An explicit `GRIDSIZE` producing coarser spacing triggers a warning but does not abort the run.
#### Compulsory Keywords
| Keyword | Default | Description |
|---------|---------|-------------|
| `ARG` | — | Collective variables (1–3 supported). |
| `KAPPA` | — | Spring constant(s) for z–λ coupling (kJ/mol/unit²). Larger κ → tighter coupling, smaller σ = √(kT/κ). One value or one per CV. |
| `TAU` | `0.5` | Oscillation period(s) of λ (time units). Sets the fictitious mass m = κτ²/(4π²). |
| `FRICTION` | `10.0` | Langevin friction on λ (1/time_unit). One value or one per CV. |
| `TEMP` | `300.0` | Temperature (K). |
| `PACE` | `5` | Force-sample deposition interval (MD steps). |
| `THRESH` | `1.0` | Kernel merge threshold in σ-normalised distance. OPES standard; lower → more compression, higher → more kernels. |
| `NSIGMACUT` | `4.0` | Kernel cutoff in σ per dimension for NW regression. 4.0 gives <2% contribution at the boundary. |
| `BIASFACTOR` | `1.0` | Exploration factor γ. `1.0` = pure ABF. `>1.0` adds density-based exploration on λ via V_ex = c·ln(1 + Z/Z₀) where c = kT(γ−1). The CZAR estimator on z is unaffected. |
| `EXPLORSCALE` | `1.0` | Per-CV scaling of the exploration force. `0.0` disables exploration on that CV (e.g., `1.0, 0.0` to drive only the first of two CVs). |
| `MUXCLAMP` | `500.0` | Per-kernel mean-force clamp on absorption (kJ/mol/unit). |
| `MAXFORCE` | `500.0` | Grid mean-force clamp per node before interpolation (kJ/mol/unit). |
| `GRIDSIZE` | `0` (auto) | Grid points per dimension. Auto-size: N = ceil(range / (2 × SIGMA_MIN)), floor 72. Defaults to 72 when `SIGMA_MIN` is unset. |
| `GRIDPACE` | `500` | Mean-force grid rebuild interval. Reduce for AIMD. |
#### Optional Keywords — Bandwidth
| Keyword | Default | Description |
|---------|---------|-------------|
| `SIGMA` | *(auto)* | Initial kernel bandwidth σ₀. Omit entirely for adaptive mode (CV variance measured during an unbiased warmup). |
| `SIGMA_MIN` | *(none)* | Bandwidth floor. Set to roughly half `SIGMA` so free-energy resolution can sharpen with more sampling. |
| `ADAPTIVE_SIGMA_STRIDE` | `10 × PACE` | Length of the unbiased warmup for automatic σ₀ determination. Used only when `SIGMA` is omitted. |
| `FIXED_SIGMA` | `false` | Disable Silverman rescaling — all kernels use σ₀ permanently. |
#### Optional Keywords — Grid Bounds
| Keyword | Default | Description |
|---------|---------|-------------|
| `GRIDMIN` | *(from CV)* | Lower grid bound(s) for non-periodic CVs. Reflecting walls applied automatically. |
| `GRIDMAX` | *(from CV)* | Upper grid bound(s) for non-periodic CVs. |
#### Optional Keywords — Neighbor List
| Keyword | Default | Description |
|---------|---------|-------------|
| `NONLIST` | `false` | Disable the neighbor list (brute-force kernel search). |
| `NLIST_PARAMETERS` | `3.0 0.5` | Cutoff factor and skin factor. Includes kernels within `cutoff × NSIGMACUT × σ`; rebuilds when the query point drifts by `skin × dev²`. |
#### Optional Keywords — Output Files
All filenames are derived from the action label (e.g., `fk: FKERNELABF ...` → `fk.*`).
| Keyword | Default | Description |
|---------|---------|-------------|
| `CZARSTRIDE` | *(off)* | Step-stamped CZAR z-kernel snapshots → `{label}.czar_kernels_{step:08d}.dat`. Feed to `czar_integrate` to recover A(z). |
| `KERNELSTRIDE` | *(off)* | Step-stamped λ-kernel snapshots → `{label}.kernels_{step:08d}.dat`. |
| `LAMBDAGRIDSTRIDE` | *(off)* | NW mean-force debug grid every N steps → `{label}.lambda_grid_{step:08d}.dat`. Bias force on the λ grid, **not** the free energy. |
| `STATESTRIDE` | `CZARSTRIDE`, else `10 × GRIDPACE` | Restart state cadence → `{label}.state.dat` (overwritten in place). State is also written automatically whenever the MD engine writes its own checkpoint. See [Restarts](#restarts) below. |
| `KERNELINFOSTRIDE` | `PACE` | Kernel diagnostics line every N steps → `{label}.kernelinfo.dat`. **Set this to match your `PRINT STRIDE` (e.g. 500); the default of `PACE` writes at every kernel deposition and adds significant I/O overhead.** |
---
### 2. Running the simulation
For the included Müller-Brown benchmark, run with PLUMED's built-in 2D toy integrator:
```bash
plumed pesmd < pesmd.in
```
This executes the simulation defined in `pesmd.in` (10M steps on the 2D Müller-Brown potential) driven by `plumed.dat`, and writes CZAR kernel snapshots at the configured stride.
#### Restarts
FK-eABF writes a complete restart state to `{label}.state.dat` at `STATESTRIDE` intervals, and additionally whenever the host MD engine writes its own checkpoint (e.g., a GROMACS `.cpt`). Coupling to the engine checkpoint keeps the PLUMED state coherent with the trajectory frame; without it, a restart can pick up a state from a slightly different step than the trajectory and the `|z − s_fict|` diagnostic will flag the mismatch.
The state file contains everything needed to resume bit-for-bit: kernel populations (with stable IDs), σ₀ and adaptive-warmup status, fictitious-particle position and velocity, exploration density Z₀, ID counters, and the full mt19937 RNG state. The mean-force grid itself is *not* serialized — it is rebuilt from the kernels on restart.
To resume, add `RESTART` to your `plumed.dat` (or pass `--restart` to the MD engine):
```plumed
RESTART
LOAD FILE=./forcekernel.cpp
fk: FKERNELABF ...
```
On restart, FK-eABF prints a banner summarising what was loaded (kernel counts, totalN, σ₀, adaptive status, fictitious particle, RNG), reconstructs the mean-force grid from the kernels, and reports rebuild statistics (populated fraction, |F_abf| max). On the first MD step it also logs `|z − s_fict|` per CV; this should be small relative to √(kT/κ). A large value usually means the trajectory checkpoint and state file are out of sync.
**Reliability features.** State writes use a write-to-tmp + atomic rename strategy with `fsync()` for on-disk durability, falling back to direct overwrite if rename fails (common on Lustre/GPFS/NFS). On a successful read, the loaded file is copied to `bck.{label}.state.dat.{N}` (DRR-style backup-on-load) so the next overwrite cannot clobber a known-good restart point. Dimensionality or temperature mismatches between the state file and the current input abort with an error; a missing state file under `RESTART` warns and starts from scratch.
---
### 3. Recover the free energy landscape
Compile `czar_integrate.cpp`:
```bash
g++ -O2 -o czar_integrate czar_integrate.cpp -lm
```
Use the executable to process CZAR kernel files:
```bash
./czar_integrate FEL_snapshots -d /path/to/scan
```
The only required argument is the output directory for PMFs. By default `czar_integrate` scans the current directory; use `-d` to point elsewhere. All files matching `*czar_kernels_XXXXXXXX.dat` are integrated and written as `FEL_XXXXXXXX.dat`.
For 1D systems, integration uses the trapezoidal rule. For 2D and higher, integration uses an MC random walk (same conventions as `abf_integrate`). The `sigma0` and `sigma_min` headers in the kernel files enable proper KDE normalization (α_k = ∏ σ₀/σ_k) for variable-bandwidth kernels and automatic grid sizing.
#### Options
| Flag | Argument | Default | Description |
|------|----------|---------|-------------|
| `-n` | `` | `0` | MC integration steps. `0` = auto-converge on RMSD. |
| `-h` | `` | `0.01` | Initial MC hill height. |
| `-f` | `` | `0.5` | Hill reduction factor (applied after warmup). |
| `-t` | `` | *(from file)* | Override kT (kJ/mol). |
| `-g` | `` | `0` (auto) | Integration grid points. Auto-sized from `sigma_min` header (default 100 if absent). |
| `-s` | `` | `4.0` | Kernel cutoff in σ units. |
| `-m` | `` | `1e-3` | Minimum density fraction for the allowed region (below → NaN). |
| `-d` | `` | `.` | Directory to scan (batch mode). |
| `-i` | `` | — | Process a single kernel file. |
| `-o` | `` | `FEL_czar.dat` | Output filename (single-file mode). |
| `-v` | — | off | Verbose progress and convergence diagnostics. |
| `-S` | `` | `0` | Skip kernel files before this step. |
#### Examples
```bash
# Batch: scan current directory, write FEL snapshots
./czar_integrate FEL_snapshots
# Batch with fixed MC steps and user-specified height
./czar_integrate FEL_snapshots -n 5000000 -h 0.2
# Skip files before step 5M
./czar_integrate FEL_snapshots -n 5000000 -h 0.2 -S 5000000
# Scan a different directory
./czar_integrate FEL_snapshots -d /path/to/run
# Single file
./czar_integrate -i fk.czar_kernels_10000000.dat -o PMF.dat
# Fine grid, verbose
./czar_integrate FEL_snapshots -g 150 -v
```
#### Output Format
**Single-file mode** (`-i`): space-separated columns `z0, z1, …, czar_grad0, czar_grad1, …, ptilde, A_czar`, where `ptilde` is the biased density (NW denominator) and `A_czar` is the free energy in kJ/mol shifted to zero at the minimum.
**Batch mode** (default): a simpler format with columns `z0, z1, …, A`.
In both modes, points below the population threshold are written as `nan`. For 2D+ grids, blank lines separate slices along the first dimension (gnuplot `pm3d` compatible).
---
### 4. Additional diagnostics
`fkabf_diagnostics.py` processes the `COLVAR` and `{label}.kernelinfo.dat` files in the current directory to produce summary plots:
```bash
python fkabf_diagnostics.py
```
#### Options
| Flag | Argument | Default | Description |
|------|----------|---------|-------------|
| `--colvar` | `` | `COLVAR` | PLUMED COLVAR file. |
| `--kernelinfo` | `` | *(auto)* | `{label}.kernelinfo.dat`. Skipped if absent. |
| `--prefix` | `` | *(auto)* | Action label prefix. Auto-detected from `_fict` columns. |
| `--dt` | `` | `0.001` | MD timestep (for converting time → steps). |
| `--thinning` | `` | `10` | Plot every Nth point in scatter / trajectory plots. |
| `--periodic` | `` | *(none)* | Periodic CV spec for minimum-image z−λ. Format: `"cv1:min:max,cv2:min:max"` or `"cv1:period"`. Supports `pi`. |
| `--outdir` | `` | `.` | Figure output directory. |
#### Examples
```bash
# Auto-detect everything in current directory
python fkabf_diagnostics.py
# Specify files and output location
python fkabf_diagnostics.py --colvar COLVAR --kernelinfo fk.kernelinfo.dat --outdir plots/
# Alanine dipeptide with periodic CVs
python fkabf_diagnostics.py --periodic "phi:-pi:pi,psi:-pi:pi" --dt 0.002
# Dense trajectory, less thinning
python fkabf_diagnostics.py --thinning 2
```
#### Output Figures
| File | Contents |
|------|----------|
| `fig_trajectory.pdf` | Per-CV: z and λ time series, z−λ over time, and z−λ histogram (minimum-image for periodic CVs). |
| `fig_bias.pdf` | \|F_bias\| and V_ex over time. |
| `fig_kernels.pdf` | Kernel counts M and M_z, n_eff, compression N/M, Silverman σ per CV, Z₀ and Z(λ) if present. |
| `fig_exploration.pdf` | 2D scatter of z and λ trajectories side-by-side, colored by time (2+ CVs only). |
| `fig_phase.pdf` | z vs λ scatter per CV, colored by time. Spread indicates coupling width √(kT/κ). |
| `fig_nlist.pdf` | Neighbor list size and nlker/M fraction over time. |
A text summary (CV ranges, z−λ standard deviation, kernel counts, compression ratio, convergence metrics) is printed to stdout before figure generation.
---
### 5. Validating your results
FK-eABF is designed to converge quickly, but fast convergence does not absolve the practitioner of proving that convergence has actually been achieved. A free-energy surface that looks reasonable is not the same as one that is correct. The following checks should be treated as mandatory.
**Verify extended-system synchronization.** All extended-system ABF methods rely on λ remaining well coupled to z; if they desynchronize, CZAR receives corrupted force samples. Confirm that the z − λ distribution is centered at zero with a width consistent with σ ≈ √(kT/κ) — `fkabf_diagnostics.py` produces this histogram automatically (`fig_phase.pdf`, `fig_trajectory.pdf`). A bimodal, skewed, or excessively broad distribution means κ or τ should be adjusted before trusting the result.
**Run multiple independent replicas.** A single trajectory that appears converged may have settled into a local minimum of the estimator without sampling all relevant basins. Run at least two — preferably three — independent replicas from different initial conditions and compare the resulting FELs. Agreement between replicas, not internal smoothness of a single run, is the minimum standard for convergence.
**Cross-method validation.** Self-consistency within a single method is necessary but not sufficient: simulations can satisfy every standard self-convergence criterion while producing quantitatively incorrect free-energy profiles. For at least one system in any study, run a parallel calculation with an independent method (OPES, WTM-eABF, REUS) and compare. Cross-method agreement is the only reliable criterion currently available for validating free-energy calculations on systems where the true answer is unknown.