Orateur
Description
Current reinforcement learning methods train Large Language Models to generate outputs that satisfy an automated judge. While this drives impressive feats of reasoning, it inadvertently incentivises the superficial appearance of correctness. Models may learn to "reward hack" by glossing over logical flaws or confidently making false claims.
In this talk, I will explore how some AI researchers are turning to formal verification to solve this illusion of competence. By pairing LLMs with proof assistants, we can shift AI training from adversarial reward-maximisation to a cooperative process where reward hacking becomes impossible. I will also examine the broader implications of this emerging capability, discussing how "formalisation on-demand" can serve as a substitute for human social credibility and lay the groundwork for fully autonomous AI mathematical research.