A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations

Paper
Feb 6, 2023
#ComputerScience

Bilal Chughtai

@bilalchughtai_

(Author)

Neel Nanda (at ICLR)

@NeelNanda5

(Author)

Read on arxiv.org

Universality is a key hypothesis in mechanistic interpretability -- that different models learn similar features and circuits when trained on similar tasks. In this work, we study t... Show More

Mentions

There are no mentions of this content so far.