- 3SAT is the best example for this -- it's much easier to say whether a response is correct than generating the correct response. - [[Editor vs Writer]] - Therefore if we use [[RLHF]] in order to train something, the AI can become much better than anyone who has trained it. [[Generator-Discriminator gap]] #halfbaked