8mo ago

A Little Bit of Reinforcement Learning from Human Feedback -- Nathan Lambert

Anyone interested in learning about RLHF? This text isn't complete yet, but looks to be a pretty useful resource as is already.

No comments