Post
Andrej Karpathy @karpathy
ยท
Jul 18, 2022
Great post on the technical challenges of training a 176B Transformer Language Model. ~10 years ago you'd train neural nets on your CPU workstation with Matlab. Now need a compute cluster and very careful orchestration of its GPU memory w.r.t. both limits and access patterns.

Replies
No replies yet