An open API service indexing awesome lists of open source software.

https://github.com/dataiku/kiji-inspector


https://github.com/dataiku/kiji-inspector

Last synced: 7 days ago
JSON representation

Awesome Lists containing this project

README

          

# Kiji Inspector: Mechanistic Interpretability for AI Agent Tool Selection


Kiji Inspector


CI Core
CI Extras
License: Apache 2.0
GitHub Stars
GitHub Issues


Python Version


Responsible AI
Contributions Welcome
PRs Welcome


## Status
This project is **under heavy active development**. We are planning to release a stable version of the framework in the coming weeks.

In the meantime, join our [Slack Community](https://join.slack.com/t/dataiku-opensource/shared_invite/zt-3o6yq14rp-FTtAHZYhyru~jLZ~S6xPLA)

Learn more about our approach and early results:

* [Paper](paper/Opening%20the%20Black%20Box%20Mechanistic%20Interpretability%20of%20Agent%20Tool%20Selection%20with%20Sparse%20Autoencoders.pdf)
* [Presentation](presentation/Opening%20the%20Black%20Box%20Mechanistic%20Interpretability%20of%20Agent%20Tool%20Selection%20with%20Sparse%20Autoencoders.pdf)

---

## What This Project Does

This project trains **Sparse Autoencoders (SAEs)** on the internal activations of an AI agent to understand *why* it selects specific tools. Given a user request like "Search our docs for API limits," the agent must choose between tools (e.g., `internal_search` vs `web_search`). We extract the model's hidden representations at the moment of that decision, decompose them into interpretable features using a JumpReLU SAE, and validate the resulting explanations through automated fuzzing and causal ablation experiments.

The key insight: train the SAE on **raw activations** (not difference vectors), then use **contrastive pairs** post-hoc to identify which learned features correspond to specific tool-selection decisions. This preserves the SAE's general feature dictionary while enabling targeted analysis of decision-relevant features.

## Install

For loading and running pretrained SAEs:

```bash
pip install kiji-inspector
```

For the full extraction, training, and analysis workflow:

```bash
pip install 'kiji-inspector[train]'
```

`kiji-inspector[full]` is also available as an alias for the same full stack.

## Quick Start

```python
from kiji_inspector import SAE

sae, feature_descriptions = SAE.from_pretrained(
base_model="nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
layer=20,
)

features = sae.encode(activations)
reconstruction = sae.decode(features)
```

Training and data-generation entrypoints live under the package namespace:

```bash
python -m kiji_inspector.generate_pairs 1300
python -m kiji_inspector.pipeline --layers 10 20 30
```

## Local vLLM patches

For local experiments that require the custom `vllm` extraction changes, rebuild the environment and apply the patch set from the repository root:

```bash
uv sync --no-cache --refresh --extra full --group dev
./patches/apply-patch.sh
```

The apply script installs every `*.patch` file under [patches](patches/) in lexical order:

- `01_allow_extract_hidden_states.patch`
- `02_support_nemotron_models.patch`
- `03_support_gemma3_models.patch`

Additional workflow details live in [patches/README_PATCH.md](patches/README_PATCH.md).

---

## 🤝 Contributing

We welcome contributions! Whether you're fixing a bug, improving documentation, or proposing a new feature, your help is appreciated.

### Ways to Contribute

- **Report Bugs** - [Open an issue](https://github.com/dataiku/kiji-inspector/issues) with steps to reproduce
- **Improve Docs** - Documentation PRs are always welcome
- **Submit Features** - Open an issue to discuss your idea before submitting a PR
- **Share Feedback** - [Start a discussion](https://github.com/dataiku/kiji-inspector/discussions)

### Community

- **Slack** - [Join our community](https://join.slack.com/t/dataiku-opensource/shared_invite/zt-3o6yq14rp-FTtAHZYhyru~jLZ~S6xPLA) to ask questions and connect with other contributors
- **Contributors** - See [CONTRIBUTORS.md](CONTRIBUTORS.md) for the list of people who have contributed

---

## 📄 License

Copyright (c) 2026 Dataiku SAS

This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.