https://github.com/aygp-dr/attribution-graphs-explorer

A toolkit for exploring attribution graphs and circuit tracing in transformer models, implemented in Guile Scheme
https://github.com/aygp-dr/attribution-graphs-explorer

attribution-graphs circuit-tracing machine-learning mechanistic-interpretability scheme transformer-interpretability

Last synced: 4 months ago
JSON representation

A toolkit for exploring attribution graphs and circuit tracing in transformer models, implemented in Guile Scheme

Host: GitHub
URL: https://github.com/aygp-dr/attribution-graphs-explorer
Owner: aygp-dr
Created: 2025-05-31T22:25:43.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-06-06T23:15:44.000Z (about 1 year ago)
Last Synced: 2025-06-19T20:11:39.440Z (12 months ago)
Topics: attribution-graphs, circuit-tracing, machine-learning, mechanistic-interpretability, scheme, transformer-interpretability
Language: Scheme
Size: 182 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 8
Metadata Files:
- Readme: README.org

Awesome Lists containing this project

README

          #+TITLE: Attribution Graphs Explorer

#+AUTHOR: AYGP-DR Research Team

#+OPTIONS: toc:3 num:t

* Attribution Graphs Explorer

[[https://img.shields.io/badge/Version-0.1.0-blue.svg][https://img.shields.io/badge/Version-0.1.0-blue.svg]]

[[https://img.shields.io/badge/Guile-3.0+-blue.svg][https://img.shields.io/badge/Guile-3.0+-blue.svg]]

[[https://img.shields.io/badge/License-MIT-green.svg][https://img.shields.io/badge/License-MIT-green.svg]]

[[https://img.shields.io/badge/Status-Alpha-orange.svg][https://img.shields.io/badge/Status-Alpha-orange.svg]]

A toolkit for exploring attribution graphs and circuit tracing in transformer models, implemented in Guile Scheme.

** Overview

Attribution Graphs Explorer is a framework for mechanistic interpretability of transformer models based on circuit tracing methods. The toolkit allows researchers to:

- Extract computational circuits from neural networks

- Trace linear paths of information flow

- Visualize attribution graphs

- Test causal hypotheses through perturbation

Our implementation builds on the methods described in the [[https://transformer-circuits.pub/2025/attribution-graphs/methods.html][Attribution Graphs]] research, allowing for programmatic analysis of neural network internals.

[[file:docs/images/overview.png]]

** Architecture

The toolkit is organized around several key components:

#+begin_src mermaid :file docs/architecture.png :mkdirp t

graph TD

    A[Input Tokens] --> B[Token Embeddings]

    B --> C[Cross-Layer Transcoder]

    C --> D[Feature Activations]

    D --> E[Attribution Graph]

    E --> F[Output Logits]

    

    style C fill:#f9f,stroke:#333,stroke-width:4px

    style E fill:#bbf,stroke:#333,stroke-width:4px

#+end_src

*** Cross-Layer Transcoders (CLT)

Cross-Layer Transcoders provide a way to bypass MLP nonlinearities, creating linear feature-to-feature interactions that can be traced through the network. The CLT modules:

- Read from the residual stream at one layer

- Contribute to all subsequent MLP layers

- Maintain sparse feature representations

*** Attribution Graphs

Attribution graphs represent the computational flow as a directed graph:

- *Nodes*: Features, tokens, and logits

- *Edges*: Attribution weights between features

- *Paths*: Computational circuits through the network

*** Circuit Discovery

The toolkit provides algorithms for finding interpretable circuits:

- Path tracing algorithms

- Circuit motif identification

- Circuit visualization

*** Validation Framework

Test hypotheses about discovered circuits:

- Perturbation experiments

- Causal validation

- Sparsity and concentration metrics

** Getting Started

*** Installation

#+begin_src shell

# Clone the repository

git clone https://github.com/aygp-dr/attribution-graphs-explorer.git

cd attribution-graphs-explorer

# Configure and build

./configure

gmake

# Run tests to verify installation

gmake test

# Run examples

gmake run

#+end_src

*** Requirements

This project has been developed and tested with the following environment:

#+CAPTION: Development Environment

#+ATTR_HTML: :border 2 :rules all :frame border

| *Component*      | *Version*       | *Notes*                                  |

|------------------+-----------------+------------------------------------------|

| Operating System | FreeBSD 14.2    | Should work on most Unix-like systems    |

| Guile            | 3.0.10          | *Minimum 3.0 required*                   |

| GNU Make         | 4.4.1           | gmake on FreeBSD                         |

| GNU Grep         | 3.11            | ggrep on FreeBSD                         |

| GNU Awk          | 5.3.2           | gawk on FreeBSD                          |

| Direnv           | 2.35.0          | For environment management               |

| Emacs            | 30.1            | For org-mode processing and documentation |

**** Required Guile Modules

- SRFI libraries: srfi-1, srfi-9, srfi-43

- ice-9 regex

*** Basic Usage

#+begin_src scheme

;; Load the framework

(add-to-load-path "/path/to/attribution-graphs-explorer")

(use-modules (attribution-graphs clt transcoder)

             (attribution-graphs graph attribution)

             (attribution-graphs circuits discovery))

;; Create a cross-layer transcoder

(define my-clt (make-clt 5 '(6 7 8) 768 128 768))

;; Generate attribution graph

(define graph (compute-attribution-graph my-clt "Example input" 'last-token))

;; Find and visualize circuits

(define circuits (find-circuits graph))

(display (circuit->mermaid circuits graph))

#+end_src

** Examples

The repository includes example applications:

*** Poetry Generation Circuit

Analyzes how transformer models plan rhyming in poetry:

#+begin_src scheme

(use-modules (attribution-graphs examples poetry-circuit))

(analyze-poetry-planning model "Roses are red\nViolets are ")

#+end_src

*** Multi-hop Reasoning Circuit

Traces factual recall with intermediate reasoning steps:

#+begin_src scheme

(use-modules (attribution-graphs examples reasoning-circuit))

(analyze-multihop-reasoning model "The capital of the state containing Dallas is")

#+end_src

** Development Status

This project is currently in *alpha* status (version 0.1.0). The core framework is implemented and functional, but several components are placeholders for demonstration purposes:

*** Implemented Features

- Core data structures (CLT, attribution graphs, nodes, edges)

- Basic mathematical operations (activation functions, matrix operations)

- Graph construction and manipulation

- Circuit discovery algorithms

- Visualization generation (Mermaid diagrams)

- Example applications (poetry and reasoning circuits)

- Test framework

*** Limitations

- Matrix operations use simplified random generation

- Some functions are placeholders (e.g., =embed-tokens=, =find-edge=)

- No actual model integration (uses mock CLT instances)

- Limited validation framework

- No real-world model examples

*** Future Development

- Integration with actual transformer models

- Improved matrix operations and numerical methods

- Enhanced visualization capabilities

- More comprehensive test suite

- Performance optimizations

- Additional circuit discovery algorithms

** Research Context

This toolkit builds on recent work in mechanistic interpretability of large language models:

- [[https://transformer-circuits.pub/2025/attribution-graphs/methods.html][Attribution Graphs Methods]] - The core technical approach

- [[https://transformer-circuits.pub/2025/attribution-graphs/biology.html][Attribution Graphs Biology]] - Application to biological knowledge

- [[https://transformer-circuits.pub/][Transformer Circuits]] - Broader context of circuit analysis

- [[https://distill.pub/2020/circuits/][Circuits: Zoom In on Neurons]] - Foundational work on circuit analysis in vision models

** License

MIT License

** Citation

If you use this toolkit in your research, please cite:

#+begin_src bibtex

@software{attribution_graphs_explorer,

  author = {AYGP-DR Research Team},

  title = {Attribution Graphs Explorer: A Toolkit for Circuit Tracing in Transformer Models},

  url = {https://github.com/aygp-dr/attribution-graphs-explorer},

  year = {2025},

}

#+end_src

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aygp-dr/attribution-graphs-explorer

Awesome Lists containing this project

README