Gilbert Saporta (CNAM Paris)
With massive data , there is no sampling errors : statistical tests and confidence intervals become useless. Generative models are often less important than predictive models. Closed form and parcimonious models are replaced by algorithms. Statistical Learning Theory initiated by V.Vapnik and the late A.Chervonenkis provides the conceptual framework for machine learning algorithms. The use of blackbox models including ensemble models is a challenge for scientific users since their interpretability is quite difficult. We will conclude by the necessity of combining statistical and machine learning tools with causal inference to get better predictions and avoid the confussion between correlation and causality.