- 3SAT is the best example for this -- it's much easier to say whether a response is correct than generating the correct response.
- [[Editor vs Writer]]
- Therefore if we use [[RLHF]] in order to train something, the AI can become much better than anyone who has trained it.
[[Generator-Discriminator gap]]
#halfbaked