Bolte, JérômeIdRefORCIDORCID: https://orcid.org/0000-0002-1676-8407, Bertoin, DavidIdRef, Gerchinovitz, SébastienIdRef and Pauwels, EdouardIdRefORCIDORCID: https://orcid.org/0000-0002-8180-075X (2021) Numerical influence of ReLU’(0) on backpropagation. Advances in Neural Information Processing Systems, Vol. 34. pp. 468-479.

[thumbnail of bertoin_51906.pdf]
Preview
Text
Download (624kB) | Preview

Abstract

In theory, the choice of ReLU(0) in [0, 1] for a neural network has a negligible influence both on backpropagation and training. Yet, in the real world, 32 bits default precision combined with the size of deep learning problems makes it a hyperparameter of training methods. We investigate the importance of the value of ReLU'(0) for several precision levels (16, 32, 64 bits), on various networks (fully connected, VGG, ResNet) and datasets (MNIST, CIFAR10, SVHN, ImageNet). We observe considerable variations of backpropagation outputs which occur around half of the time in 32 bits precision. The effect disappears with double precision, while it is systematic at 16 bits. For vanilla SGD training, the choice ReLU'(0) = 0 seems to be the most efficient. For our experiments on ImageNet the gain in test accuracy over ReLU'(0) = 1 was more than 10 points (two runs). We also evidence that reconditioning approaches as batch-norm or ADAM tend to buffer the influence of ReLU'(0)’s value. Overall, the message we convey is that algorithmic differentiation of nonsmooth problems potentially hides parameters that could be tuned advantageously.

Item Type: Article
Language: English
Date: 2021
Refereed: Yes
Subjects: B- ECONOMIE ET FINANCE
Divisions: TSE-R (Toulouse)
Site: UT1
Date Deposited: 02 Feb 2026 16:05
Last Modified: 02 Feb 2026 16:08
OAI Identifier: oai:tse-fr.eu:131319
URI: https://publications.ut-capitole.fr/id/eprint/51906
View Item

Downloads

Downloads per month over past year