Thanks for this excellent thread. Rich Shiffrin and I recently wrote a short commentary on this topic as well: [link] Doing psychological testing on AI systems is a tricky business!
Great thread discussing common issues with LLMs evaluation, and how to do better using methods from the behavioral sciences. #LLMs #evaluation