I train a neural network – one of the Resnet variations ($approx 10^7$ parameters) on the CIFAR-10 dataset – and after each epoch, I would like to find the smallest/largest eigenvalues of its Hessian. For that, I can use hessian-vector products (i.e. $f(v) = H v$, where $H$ is the Hessian corresponding to the batch I’m currently using, PyTorch has a built-in mechanism for that), so, for example, I can use the power method. However, the pure power method will be very slow: one needs a pass over the entire data to compute the true Hessian-vector product.
Question: do you know of an efficient algorithm for this task? To be precise, I mean that both eigenvalues can be computed with a reasonable multiplicative error within at most 10 minutes.
Note that I’m asking about an algorithm that you know to be efficient for this (or similar) problem. I tried the power method, accelerated power method, stochastic power method, Oja’s algorithm, gradient-based algorithm, its accelerated version, algorithms from https://arxiv.org/abs/1707.02670. All these experiments take a lot of time and so far I didn’t have any success, no matter how much engineering I used. When eigenvalues are close to $0$ (e.g. of order $-frac 12$, when the largest eigenvalue is of order $100$), either convergence takes a lot of time or the results are unstable/unreliable.
Just in case, I’m aware of PyHessian (and the first version of my code is based on theirs).