In this paper, we present a comprehensive analysis of the Lipschitz constant for integral kernels [1] and random feature methods [2]. Integral kernels form a broad class of kernels, including those with Mercer decompositions or shift-invariant properties, while random features provide a scalable and computationally efficient approach to approximate these kernels for large datasets. Notice that the random feature approach demonstrate competitive performance with low training times and no optimization issues [2,8,9].
Controlling Lipschitz constant is a crucial task in the study of robustness against adversarial attacks [3,4,5], as well as for control applications [6] and in the design of robust virtual sensors [7].
Our first contribution is the derivation of the optimal Lipschitz constant for the feature map of a differentiable integral kernel. Secondly, under minimal assumptions and leveraging tools from empirical process theory [10], we show that the Lipschitz constant of a regression model using random feature maps converges almost surely to the Lipschitz constant of the corresponding solution with the integral kernel. We also provide a supporting concentration inequality. These results are particularly applicable to random Fourier features and to random neural networks with ReLU activation functions where only the last layer is trained.
Finally, we validate our theoretical results through numerical experiments and demonstrate their practical relevance with an industrial application developed with LIEBHERR Aerospace.
[1] T. Hotz, F. Telschow, Representation by Integrating Reproducing Kernels, arxiv, 2012. [2] A. Rahimi, B. Recht, Random features for large-scale kernel machines, Advances in Neural Information Processing Systems, 2007, pp. 1177–1184. [3] Teng Zhang, Kang Li, Understanding Overfitting in Adversarial Training via Kernel Regression, arxiv, 2023. [4] T. Weng, H. Zhang, P. Chen, J. Yi, D. Su, Y. Gao, C. Hsieh, L. Daniel, Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach, ICLR 2018. [5] A. Blaas, S. J. Roberts, The Effect of Prior Lipschitz Continuity on the Adversarial Robustness of Bayesian Neural Networks, arxiv, 2021. [6] H. J. van Waarde, R. Sepulchre, Training Lipschitz continuous operators using reproducing kernels, Proceedings of Machine Learning Research vol 168:1–13, 2022. [7] M. Damour, C. Cappi, L. Gardès, F. De Grancey, E. Jenn, B. Lefevre, G. Flandin, S. Gerchinovitz, F. Mamalet, A. Albore. White paper machine learning in certified systems, IRT Saint Exupéry, 2021 [8] Q. Le, T. Sarlós, A. Smola, Fastfood-computing hilbert space expansions in loglinear time, 30th International Conference on Machine Learning (ICML), 2013, pp. 244–252. [9] A. Rahimi, B. Recht, Uniform approximation of functions with random bases, 46th Annual Allerton Conference on Communication, Control, and Computing, IEEE, 2008, pp. 555–561. [10] A. W. van der Vaart, J.A. Wellner, Weak Convergence and Empirical Process with Applications to Statistics, Springer, 1995.