An open API service indexing awesome lists of open source software.

https://github.com/aygp-dr/attribution-graphs-explorer

A toolkit for exploring attribution graphs and circuit tracing in transformer models, implemented in Guile Scheme
https://github.com/aygp-dr/attribution-graphs-explorer

attribution-graphs circuit-tracing machine-learning mechanistic-interpretability scheme transformer-interpretability

Last synced: 4 months ago
JSON representation

A toolkit for exploring attribution graphs and circuit tracing in transformer models, implemented in Guile Scheme

Awesome Lists containing this project

README

          

#+TITLE: Attribution Graphs Explorer
#+AUTHOR: AYGP-DR Research Team
#+OPTIONS: toc:3 num:t

* Attribution Graphs Explorer

[[https://img.shields.io/badge/Version-0.1.0-blue.svg][https://img.shields.io/badge/Version-0.1.0-blue.svg]]
[[https://img.shields.io/badge/Guile-3.0+-blue.svg][https://img.shields.io/badge/Guile-3.0+-blue.svg]]
[[https://img.shields.io/badge/License-MIT-green.svg][https://img.shields.io/badge/License-MIT-green.svg]]
[[https://img.shields.io/badge/Status-Alpha-orange.svg][https://img.shields.io/badge/Status-Alpha-orange.svg]]

A toolkit for exploring attribution graphs and circuit tracing in transformer models, implemented in Guile Scheme.

** Overview

Attribution Graphs Explorer is a framework for mechanistic interpretability of transformer models based on circuit tracing methods. The toolkit allows researchers to:

- Extract computational circuits from neural networks
- Trace linear paths of information flow
- Visualize attribution graphs
- Test causal hypotheses through perturbation

Our implementation builds on the methods described in the [[https://transformer-circuits.pub/2025/attribution-graphs/methods.html][Attribution Graphs]] research, allowing for programmatic analysis of neural network internals.

[[file:docs/images/overview.png]]

** Architecture

The toolkit is organized around several key components:

#+begin_src mermaid :file docs/architecture.png :mkdirp t
graph TD
A[Input Tokens] --> B[Token Embeddings]
B --> C[Cross-Layer Transcoder]
C --> D[Feature Activations]
D --> E[Attribution Graph]
E --> F[Output Logits]

style C fill:#f9f,stroke:#333,stroke-width:4px
style E fill:#bbf,stroke:#333,stroke-width:4px
#+end_src

*** Cross-Layer Transcoders (CLT)

Cross-Layer Transcoders provide a way to bypass MLP nonlinearities, creating linear feature-to-feature interactions that can be traced through the network. The CLT modules:

- Read from the residual stream at one layer
- Contribute to all subsequent MLP layers
- Maintain sparse feature representations

*** Attribution Graphs

Attribution graphs represent the computational flow as a directed graph:

- *Nodes*: Features, tokens, and logits
- *Edges*: Attribution weights between features
- *Paths*: Computational circuits through the network

*** Circuit Discovery

The toolkit provides algorithms for finding interpretable circuits:

- Path tracing algorithms
- Circuit motif identification
- Circuit visualization

*** Validation Framework

Test hypotheses about discovered circuits:

- Perturbation experiments
- Causal validation
- Sparsity and concentration metrics

** Getting Started

*** Installation

#+begin_src shell
# Clone the repository
git clone https://github.com/aygp-dr/attribution-graphs-explorer.git
cd attribution-graphs-explorer

# Configure and build
./configure
gmake

# Run tests to verify installation
gmake test

# Run examples
gmake run
#+end_src

*** Requirements

This project has been developed and tested with the following environment:

#+CAPTION: Development Environment
#+ATTR_HTML: :border 2 :rules all :frame border
| *Component* | *Version* | *Notes* |
|------------------+-----------------+------------------------------------------|
| Operating System | FreeBSD 14.2 | Should work on most Unix-like systems |
| Guile | 3.0.10 | *Minimum 3.0 required* |
| GNU Make | 4.4.1 | gmake on FreeBSD |
| GNU Grep | 3.11 | ggrep on FreeBSD |
| GNU Awk | 5.3.2 | gawk on FreeBSD |
| Direnv | 2.35.0 | For environment management |
| Emacs | 30.1 | For org-mode processing and documentation |

**** Required Guile Modules

- SRFI libraries: srfi-1, srfi-9, srfi-43
- ice-9 regex

*** Basic Usage

#+begin_src scheme
;; Load the framework
(add-to-load-path "/path/to/attribution-graphs-explorer")
(use-modules (attribution-graphs clt transcoder)
(attribution-graphs graph attribution)
(attribution-graphs circuits discovery))

;; Create a cross-layer transcoder
(define my-clt (make-clt 5 '(6 7 8) 768 128 768))

;; Generate attribution graph
(define graph (compute-attribution-graph my-clt "Example input" 'last-token))

;; Find and visualize circuits
(define circuits (find-circuits graph))
(display (circuit->mermaid circuits graph))
#+end_src

** Examples

The repository includes example applications:

*** Poetry Generation Circuit

Analyzes how transformer models plan rhyming in poetry:

#+begin_src scheme
(use-modules (attribution-graphs examples poetry-circuit))
(analyze-poetry-planning model "Roses are red\nViolets are ")
#+end_src

*** Multi-hop Reasoning Circuit

Traces factual recall with intermediate reasoning steps:

#+begin_src scheme
(use-modules (attribution-graphs examples reasoning-circuit))
(analyze-multihop-reasoning model "The capital of the state containing Dallas is")
#+end_src

** Development Status

This project is currently in *alpha* status (version 0.1.0). The core framework is implemented and functional, but several components are placeholders for demonstration purposes:

*** Implemented Features
- Core data structures (CLT, attribution graphs, nodes, edges)
- Basic mathematical operations (activation functions, matrix operations)
- Graph construction and manipulation
- Circuit discovery algorithms
- Visualization generation (Mermaid diagrams)
- Example applications (poetry and reasoning circuits)
- Test framework

*** Limitations
- Matrix operations use simplified random generation
- Some functions are placeholders (e.g., =embed-tokens=, =find-edge=)
- No actual model integration (uses mock CLT instances)
- Limited validation framework
- No real-world model examples

*** Future Development
- Integration with actual transformer models
- Improved matrix operations and numerical methods
- Enhanced visualization capabilities
- More comprehensive test suite
- Performance optimizations
- Additional circuit discovery algorithms

** Research Context

This toolkit builds on recent work in mechanistic interpretability of large language models:

- [[https://transformer-circuits.pub/2025/attribution-graphs/methods.html][Attribution Graphs Methods]] - The core technical approach
- [[https://transformer-circuits.pub/2025/attribution-graphs/biology.html][Attribution Graphs Biology]] - Application to biological knowledge
- [[https://transformer-circuits.pub/][Transformer Circuits]] - Broader context of circuit analysis
- [[https://distill.pub/2020/circuits/][Circuits: Zoom In on Neurons]] - Foundational work on circuit analysis in vision models

** License

MIT License

** Citation

If you use this toolkit in your research, please cite:

#+begin_src bibtex
@software{attribution_graphs_explorer,
author = {AYGP-DR Research Team},
title = {Attribution Graphs Explorer: A Toolkit for Circuit Tracing in Transformer Models},
url = {https://github.com/aygp-dr/attribution-graphs-explorer},
year = {2025},
}
#+end_src