More gradient descent news

More gradient descent news.

"Grimmer found that the fastest sequences always had one thing in common: The middle step was always a big one. Its size depended on the number of steps in the repeating sequence."

Hooray to cyclical and large learning rates!

https://www.quantamagazine.org/risky-giant-steps-can-solve-optimization-problems-faster-20230811/

The original paper:
Provably Faster Gradient Descent via Long Steps
https://arxiv.org/abs/2307.06324

2023-08-11 17:09:59