Mathematics for and by Large Language Models

Europe/Paris
Centre de Conférences Marilyn et James Simons (Le Bois-Marie)

Centre de Conférences Marilyn et James Simons

Le Bois-Marie

35, route de Chartres CS 40001 91893 Bures-sur-Yvette Cedex
Description

The goal of this conference is to advance the dialogue and interactions between the LLM community and the larger world of mathematics in order to further the mathematical understanding of LLMs and contribute to solving some of the outstanding problems in the new field of LLMs.

 

In particular we intend to investigate mathematical structures that can be used to understand LLMs in terms of what they implicitly learn and how. 

 

At the same time, in the opposite direction the use of LLMs in order to do mathematics will be investigated.

 

Registration is free and open until May 16, 2024.

Invited speakers:
François Charton (Meta AI Research)
Andrew Dudzik (Google DeepMind)
Amaury Hayat (École des Ponts ParisTech & CERMICS)
Julia Kempe (NYU Center for Data Science & CIMS)
Gabriel Synnaeve (Meta AI Research)
Yiannis Vlassopoulos (Athena Research Center & IHES)

Organizers: 
François Charton (Meta AI Research), Michael Douglas (Harvard University & IHES) & Yiannis Vlassopoulos (Athena Research Center & IHES)

 

Cécile Gourgues
    • 09:00
      Café d'accueil
    • 1
      How can Machine Learning Help Mathematicians

      Large Language models have known large successes in recent years. This naturally raises the question: can AI assist mathematicians in solving open problems in mathematics? We will explore how a language model can be trained to learn a mathematical intuition on open problems and guess candidate solutions, with a focus on a few examples. We will also explore the application of LLM to automated theorem proving with an online training procedure and discuss new perspectives in the area.

      Orateur: Amaury Hayat (École des Ponts ParisTech & CERMICS)
    • 2
      Synthetic Data – Friend or Foe in the Age of Scaling?

      As AI and LLM model size grows, neural scaling laws have become a crucial tool to predict the improvements of large models when increasing capacity and the size of original (human or natural) training data. Yet, the widespread use of popular models means that the ecosystem of online data and text will co-evolve to progressively contain increased amounts of synthesized data.
      In this talk we ask: How will the scaling laws change in the inevitable regime where synthetic data makes its way into the training corpus? Will future models, still improve, or be doomed to degenerate up to total (model) collapse? We develop a theoretical framework of model collapse through the lens of scaling laws. We discover a wide range of decay phenomena, analyzing loss of scaling, shifted scaling with number of generations, the ''un-learning" of skills, and grokking when mixing human and synthesized data. Our theory is validated by large-scale experiments with a transformer on an arithmetic task and text generation using the LLM Llama2.

      Orateur: Julia Kempe (NYU Center for Data Science and Courant Institute of Mathematical Sciences)
    • 11:30
      Pause Café
    • 3
      A First Approximation to the Mathematical Structure Computed by Large Language Models

      Large Language Models are transformer neural networks which are trained to produce a probability distribution on the possible next words to given texts in a corpus, in such a way that the most likely word predicted, is the actual word in the training text.

      We will explain what is the mathematical structure defined by such conditional probability distributions of text extensions. Changing the viewpoint from probabilities to -log probabilities we observe that the data of text extensions are encoded in a directed (non-symmetric) metric structure defined on the space of texts ${\mathcal L}$. We then construct a directed metric polyhedron $P({\mathcal L})$, in which ${\mathcal L}$ is isometrically embedded as generators of certain special extremal rays. Each such generator encodes extensions of a text along with the corresponding probabilities.

      Moreover $P({\mathcal L})$ is $(\min, +)$ (i.e. tropically) generated by the text extremal rays and is the image of a $(\min,+)$ projector (given by the metric on ${\mathcal L}$). This leads to a duality theorem relating the polyhedron $P({\mathcal L})$ defined by text extensions to one defined by text restrictions. We also explain that the generator of the extremal ray corresponding to a text is approximated by a Boltzmann weighted linear combination of generators of extremal rays corresponding to the words making up that text.

      The metric space ${\mathcal L}$ can equivalently be considered as an enriched category and then the embedding into $P({\mathcal L})$ is the Yoneda embedding into the category of presheaves. In fact all constructions have categorical meaning (in particular generalizing the familiar view of language as a monoid or as a poset with the subtext order). The categorical interpretations will be explained in parallel.
      This is joint work with Stéphane Gaubert.

      Orateur: Yiannis Vlassopoulos (Athena Research Center & IHES)
    • 13:00
      Déjeuner-Buffet
    • 4
      Mathematics as a Translation Task - the Importance of Training Distributions

      Many problems of mathematics can be set as translation tasks: problems, represented as sentences in some language, are translated into their solutions, by language models trained from synthetic examples. In this setting, we can choose the distribution of problems and solutions we use to train the model. I present examples from three different experiments, which suggest that this can make a large difference in model performance, and provide intuition on the inner workings of transformer models.

      Orateur: François Charton (Meta AI Research)
    • 5
      Three Problems in the Mathematics of Deep Learning

      Neural networks, particularly LLMs, are notoriously poor at algorithmic tasks, such as sorting, shortest path, and even basic arithmetic. Across three papers, we explored the problem of "aligning" architectures to classical computer programs, and showed that this question relates to familiar mathematical concepts: polynomial functors, cohomology, and higher categories.

      Orateur: Andrew Dudzik (Google DeepMind)
    • 16:30
      Pause Café
    • 6
      TBA
      Orateur: Gabriel Synnaeve (Meta AI Research)