Probabilités et statistiques

Stabilizing black-box model selection

by Rebecca Willett

Europe/Paris
Description

Model selection is the process of choosing from a class of candidate models given data. For instance, we may wish to select which set of features best predict a label or response or select an equation that hypothesizes a model of a dynamic biological process. However, absent strong assumptions, typical approaches to these problems are highly unstable: if a single data point is removed from the training set, a different model may be selected. In this talk, I will present a new approach to stabilizing model selection with theoretical stability guarantees that leverages a combination of bagging and an ''inflated'' argmax operation. Our method selects a small collection of models that all fit the data, and it is stable in that, with high probability, the removal of any training point will result in a collection of selected models that overlap with the original collection. We illustrate this method in a model selection problem focused on identifying how competition in an ecosystem influences species' abundances and a graph estimation problem using cell-signaling data from proteomics. In these settings, the proposed method yields stable, compact, and accurate collections of selected models, outperforming a variety of benchmarks. This is joint work with Melissa Adrian and Jake Soloff.