upcarta
Sign In
Sign Up
Explore
Search
Miles Turpin
Follow
No followers
community-curated profile
Language model alignment @nyuniversity, @CohereAI
Featured content
See All
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
by Miles Turpin
⚡️New paper!⚡️ It’s tempting to interpret chain-of-thought explanations as the LLM's process for solving a task. In this new work, we show that CoT explanations can systematically misrepresent the true reason for model predictions. arxiv.org/abs/2305.
by Miles Turpin