An often used argument AGAINST applying kernel machines on large datasets is the scaling of the computational costs with the dataset size. The theme of this talk is to question whether it is actually necessary to process the whole dataset in the first place. For that, we will look at a quantity many kernel machines rely on: the log-determinant of a kernel matrix.
For the exact computation, the default algorithm is the Cholesky decomposition whose computational cost scales cubically in the dataset size. It turns out, under mild assumptions, it is possible to estimate the log-determinant from intermediate steps of the Cholesky, with probabilistic guarantee on the relative error. Main result of this talk is an an augmentation of the Cholesky decomposition that stops the algorithm before processing the whole matrix when a desired precision can be guaranteed. Experiments demonstrate that this can save a considerable amount of time while rarely exceeding an overhead of more than 5% when not stopping early.
More generally, this talk presents a probabilistic stopping strategy for the approximation of a sum of known length where addends are revealed sequentially. We do not assume independence between addends, only that they form a super-martingale and are bounded from below.