Yann Ollivier (Paris-Sud University)
The optimization methods used to learn models of data are often not invariant under simple changes in the representation of data or of intermediate variables. For instance, for neural networks, using neural activities in [0;1] or in [-1;1] can lead to very different final performance even though the two representations are isomorphic. Here we show how information theory, together with a Riemannian geometric viewpoint emphasizing independence from the details of data representation, leads to new, scalable algorithms for training models of sequential data, which detect more complex patterns and use fewer training samples. For the talk, no familiarity will be assumed with Riemannian geometry, neural networks, information theory, or statistical learning.