Mathematics for and by Large Language Models -- 2025 Edition
jeudi 22 mai 2025 -
09:00
lundi 19 mai 2025
mardi 20 mai 2025
mercredi 21 mai 2025
jeudi 22 mai 2025
09:00
Café d'accueil
Café d'accueil
09:00 - 09:30
Room: Centre de Conférences Marilyn et James Simons
09:30
Project Numina and AI for Theorem Proving
-
Yann Fleureau
(
Numina
)
Project Numina and AI for Theorem Proving
Yann Fleureau
(
Numina
)
09:30 - 10:20
Room: Centre de Conférences Marilyn et James Simons
Project Numina develops open-source open-dataset AI for mathematics. Numina won the AIMO progress prize in 2024 and released NuminaMath, the largest open dataset of math problem and solutions. Numina released the KiminaProver last month, a SOTA formal math prover. We will present the past, present and future developments of Numina along with our AI for theorem proving approaches.
10:20
Deciding What Game to Play, What Mathematics Problem to Solve
-
Katie Collins
(
Cambridge University
)
Deciding What Game to Play, What Mathematics Problem to Solve
Katie Collins
(
Cambridge University
)
10:20 - 11:10
Room: Centre de Conférences Marilyn et James Simons
Mathematicians have limited lifetimes. A hallmark of a good mathematician is not just solving new problems well, but deciding what problems are worth solving in the first place. How do we decide what problems are worth our energy? In this talk, I will present some of our recent work aimed at better understanding how people decide what problems are worthwhile. We focus first on characterizing and computationally modeling peoples’ judgments about games they have never played before, and compare these judgments to LLMs among other alternate models. We then extend this work to mathematics and present early results and thinking about modeling evaluations about whether mathematics problems are worth pursuing (in both “everyday” and expert mathematics contexts). We close by discussing some translational potential in benchmarking LLMs and informing the design of more human-compatible AI thought partners for mathematics.
11:10
Pause Café
Pause Café
11:10 - 11:40
Room: Centre de Conférences Marilyn et James Simons
11:40
A Roadmap for Math Copilots
-
Simon Frieder
(
Oxford University
)
A Roadmap for Math Copilots
Simon Frieder
(
Oxford University
)
11:40 - 12:30
Room: Centre de Conférences Marilyn et James Simons
Math Copilots are the envisioned successors to the current generation of LLMs that assist mathematicians in various mathematical activities - from devising proofs to finding counterexamples. In this talk, I will survey and categorize the existing landscape of AI-driven mathematical software tools and benchmarks (including the AIMO series of competitions) and show which benchmarks are missing towards pointing LLMs to hill-climb in the direction of Math Copilots. I will conclude with a call to participate in a new type of benchmark, the BeyondProof benchmark.
12:30
Déjeuner-Buffet
Déjeuner-Buffet
12:30 - 14:00
Room: Centre de Conférences Marilyn et James Simons
14:00
Why are LLMs not Better at Finding Proofs? (visio conference)
-
Timothy Gowers
(
Collège de France
)
Why are LLMs not Better at Finding Proofs? (visio conference)
Timothy Gowers
(
Collège de France
)
14:00 - 14:50
Room: Centre de Conférences Marilyn et James Simons
TBA
14:50
Machine Learning in PDE: Discovering New, Unstable Solutions
-
Javier Gómez-Serrano
(
Brown University
)
Machine Learning in PDE: Discovering New, Unstable Solutions
Javier Gómez-Serrano
(
Brown University
)
14:50 - 15:40
Room: Centre de Conférences Marilyn et James Simons
In this talk I will explain several recent results combining machine learning techniques and more traditional mathematics. The overarching theme is the interplay between modern (ML) and classical methods in order to discover new solutions of certain PDE with low or very low numerical error. I will also outline how to turn the numerical approximate solutions into a rigorous proof via computer-assisted methods, leading in some cases to singularity formation, or to the existence of certain special solutions (e.g. traveling waves). In particular, unstable solutions are now amenable to be discovered. While the motivation is originally coming from fluid mechanics' equations such as Euler or Navier-Stokes some of the methods can be tuned to other PDE.
15:40
Pause Café
Pause Café
15:40 - 16:10
Room: Centre de Conférences Marilyn et James Simons
16:10
Universal Optimal Control, Reinforcement Learning, and Reaching Goals in LLMs
-
Yann Ollivier
(
FAIR at Meta, Paris
)
Universal Optimal Control, Reinforcement Learning, and Reaching Goals in LLMs
Yann Ollivier
(
FAIR at Meta, Paris
)
16:10 - 17:00
Room: Centre de Conférences Marilyn et James Simons
TBA
17:00
Search, Reason or Recombine? Paradigms for Scaling Formal Proving
-
Fabian Glöckle
(
Ecole Nationale des Ponts et Chaussées et FAIR at Meta, Paris
)
Search, Reason or Recombine? Paradigms for Scaling Formal Proving
Fabian Glöckle
(
Ecole Nationale des Ponts et Chaussées et FAIR at Meta, Paris
)
17:00 - 17:50
Room: Centre de Conférences Marilyn et James Simons
In the effort to scale test-time computation for language models on mathematical benchmarks, two prominent paradigms have emerged: large-scale search with reinforcement learning, exemplified by methods like AlphaProof, and long chain-of-thought reasoning with emergent self-verification, as seen in models like o1. For the future of reinforcement learning in formal theorem proving, this opens up a spectrum of hybrid methods. These range from line-level tree search with environment feedback to multi-turn iterative whole proof generation, with and without integrated informal reasoning, to hierarchical problem decompositions and recombination of partial proofs. I will explain these methods as inference methods and discuss the challenges faced when applying reinforcement learning to them.