https://github.com/jaydu1/fdfi
Disentangled feature importance
https://github.com/jaydu1/fdfi
Last synced: about 1 month ago
JSON representation
Disentangled feature importance
- Host: GitHub
- URL: https://github.com/jaydu1/fdfi
- Owner: jaydu1
- License: mit
- Created: 2025-12-21T08:10:56.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2026-04-29T00:17:22.000Z (about 1 month ago)
- Last Synced: 2026-04-29T01:24:47.603Z (about 1 month ago)
- Language: Python
- Homepage: http://fdfi.readthedocs.io
- Size: 1.02 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# FDFI - Flow-Disentangled Feature Importance
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
[](https://pypi.org/project/fdfi)
[](https://pepy.tech/projects/fdfi)
A Python library for computing feature importance using disentangled methods, inspired by SHAP.
Current release: `0.0.5`
## Overview
FDFI (Flow-Disentangled Feature Importance) is a Python module that provides interpretable machine learning explanations through disentangled feature importance methods. This package implements both DFI (Disentangled Feature Importance) and FDFI (Flow-DFI) methods. Similar to SHAP, FDFI helps you understand which features are driving your model's predictions.
## Features
- ๐ฏ **Multiple Explainer Types**: Tree, Linear, and Kernel explainers for different model types
- ๐งญ **OT-Based DFI**: Gaussian OT (OTExplainer) and Entropic OT (EOTExplainer)
- ๐ **Flow-DFI**: FlowExplainer with CPI and SCPI methods for non-Gaussian data
- ๐ **Rich Visualizations**: Summary, waterfall, force, and dependence plots
- ๐ง **Easy to Use**: Simple API similar to SHAP
- ๐งช **Statistical Inference**: Confidence intervals and multiple testing correction (FDR/FWER)
- ๐ **Extensible**: Built with modularity in mind for future enhancements
## Installation
### From Source
```bash
git clone https://github.com/jaydu1/FDFI.git
cd FDFI
pip install -e .
```
### Dependencies
Use `pyproject.toml` extras:
```bash
pip install -e ".[dev]"
pip install -e ".[plots]"
pip install -e ".[flow]"
```
## Quick Start
```python
import numpy as np
from fdfi.explainers import OTExplainer
# Define your model
def model(X):
return X.sum(axis=1)
# Create background data
X_background = np.random.randn(100, 10)
# Create an explainer
explainer = OTExplainer(model, data=X_background, nsamples=50)
# Explain test instances
X_test = np.random.randn(10, 10)
results = explainer(X_test)
# Confidence intervals (post-hoc)
ci = explainer.conf_int(alpha=0.05, target="X", alternative="two-sided")
# With multiple testing correction (e.g., FDR control)
ci_fdr = explainer.conf_int(multitest_method="fdr_bh")
explainer.summary(multitest_method="fdr_bh")
```
### CI Defaults in v0.0.2
By default, `conf_int()` now uses:
- `var_floor_method="mixture"`
- `margin_method="mixture"`
This improves stability for weak effects and avoids ad hoc thresholding in many use cases.
You can still override both methods explicitly if needed.
## EOT Options (Entropic OT)
`EOTExplainer` supports adaptive epsilon, stochastic transport sampling, and
Gaussian/empirical targets:
```python
from fdfi.explainers import EOTExplainer
explainer = EOTExplainer(
model.predict,
X_background,
auto_epsilon=True,
stochastic_transport=True,
n_transport_samples=10,
target="gaussian", # or "empirical"
)
results = explainer(X_test)
```
## Flow-DFI with FlowExplainer
`FlowExplainer` uses normalizing flows for non-Gaussian data, supporting both CPI (Conditional Permutation Importance) and SCPI (Sobol-CPI):
- **CPI**: Average predictions first, then squared difference: $(Y - E[f(\tilde{X})])^2$
- **SCPI**: Squared differences first, then average: $E[(Y - f(\tilde{X}_b))^2]$
```python
from fdfi.explainers import FlowExplainer
# Create explainer with CPI (default)
explainer = FlowExplainer(
model.predict,
X_background,
fit_flow=True,
method='cpi', # 'cpi', 'scpi', or 'both'
num_steps=200, # flow training steps
nsamples=50, # counterfactual samples
sampling_method='resample', # 'resample', 'permutation', 'normal', 'condperm'
)
results = explainer(X_test)
# results['phi_Z']: Z-space importance
# results['phi_X']: same as phi_Z (Z-space methods)
# Confidence intervals
ci = explainer.conf_int(alpha=0.05, target="Z", alternative="two-sided")
```
### Explainer diagnostics (new in v0.0.2)
Disentangled explainers (`OTExplainer`, `EOTExplainer`, and `FlowExplainer`) report two diagnostics with qualitative labels (GOOD / MODERATE / POOR) using consistent `[FDFI][DIAG]` logging:
- **Latent independence (median dCor)** โ lower is better (thresholds: <0.10 good, <0.25 moderate).
- **Distribution fidelity (MMD)** โ lower is better (thresholds: <0.05 good, <0.15 moderate).
Example log:
```
[FDFI][DIAG] Flow Model Diagnostics
[FDFI][DIAG] Latent independence (median dCor): 0.0421 [GOOD] โ lower is better
[FDFI][DIAG] Distribution fidelity (MMD): 0.0187 [GOOD] โ lower is better
```
Access diagnostics directly:
```python
diag = explainer.diagnostics
print(diag["latent_independence_median"], diag["latent_independence_label"])
print(diag["distribution_fidelity_mmd"], diag["distribution_fidelity_label"])
```
For advanced users, flow models can be trained separately:
```python
from fdfi.models import FlowMatchingModel
# Train flow model externally
flow_model = FlowMatchingModel(X_background, dim=X_background.shape[1])
flow_model.fit(num_steps=500, verbose='final')
# Set pre-trained flow
explainer = FlowExplainer(model.predict, X_background, fit_flow=False)
explainer.set_flow(flow_model)
```
## Project Structure
```
FDFI/
โโโ fdfi/ # Main package directory
โ โโโ __init__.py # Package initialization
โ โโโ explainers.py # Explainer classes
โ โโโ plots.py # Visualization functions
โ โโโ utils.py # Utility functions
โโโ tests/ # Test suite
โ โโโ test_explainers.py
โ โโโ test_plots.py
โ โโโ test_utils.py
โโโ docs/ # Documentation & tutorials
โ โโโ tutorials/ # Jupyter notebook tutorials
โโโ pyproject.toml # Package configuration
โโโ README.md # This file
```
## Development Status
๐ง **This is starter code for DFI development.** The core structure and API are in place, but full implementations are coming soon.
Current status:
- โ
Package structure established
- โ
Base classes and interfaces defined
- โ
Testing framework set up
- โ
Documentation structure created
- ๐ง Core algorithms (in development)
- ๐ง Visualization functions (in development)
## Testing
Run the test suite:
```bash
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run tests with coverage
pytest --cov=fdfi --cov-report=html
```
## Documentation
Full documentation and tutorials are available in the `docs/` directory:
- [Quickstart Tutorial](docs/tutorials/quickstart.ipynb)
- [OT Explainer Tutorial](docs/tutorials/ot_explainer.ipynb)
- [EOT Explainer Tutorial](docs/tutorials/eot_explainer.ipynb)
- [Flow Explainer Tutorial](docs/tutorials/flow_explainer.ipynb)
- [Confidence Intervals](docs/tutorials/confidence_intervals.ipynb)
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## References
FDFI is based on:
- Du, J.-H., Roeder, K., & Wasserman, L. (2025). Disentangled Feature Importance. *arXiv preprint arXiv:2507.00260*.
- Chen, X., Guo, Y., & Du, J.-H. (2026). Flow-Disentangled Feature Importance. In *The Thirteenth International Conference on Learning Representations (ICLR)*.
Related work:
- [SHAP](https://github.com/slundberg/shap): A game theoretic approach to explain machine learning models
## Citation
If you use DFI in your research, please cite:
```bibtex
@software{dfi2026,
title={DFI: Python Library for Disentangled Feature Importance},
author={DFI Team},
year={2026},
url={https://github.com/jaydu1/FDFI}
}
@article{du2025disentangled,
title={Disentangled Feature Importance},
author={Du, Jin-Hong and Roeder, Kathryn and Wasserman, Larry},
journal={arXiv preprint arXiv:2507.00260},
year={2025}
}
@inproceedings{chen2026flow,
title={Flow-Disentangled Feature Importance},
author={Chen, Xin and Guo, Yifan and Du, Jin-Hong},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2026}
}
```
## Contact
For questions and issues, please use the [GitHub issue tracker](https://github.com/jaydu1/FDFI/issues).