A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations
- Paper
- Feb 6, 2023
- #ComputerScience
Universality is a key hypothesis in mechanistic interpretability -- that different models learn similar features and circuits when trained on similar tasks. In this work, we study t...
Show More
Mentions
There are no mentions of this content so far.