More gradient descent news
More gradient descent news.
"Grimmer found that the fastest sequences always had one thing in common: The middle step was always a big one. Its size depended on the number of steps in the repeating sequence."
Hooray to cyclical and large learning rates!
https://www.quantamagazine.org/risky-giant-steps-can-solve-optimization-problems-faster-20230811/
The original paper:
Provably Faster Gradient Descent via Long Steps
https://arxiv.org/abs/2307.06324
"Grimmer found that the fastest sequences always had one thing in common: The middle step was always a big one. Its size depended on the number of steps in the repeating sequence."
Hooray to cyclical and large learning rates!
https://www.quantamagazine.org/risky-giant-steps-can-solve-optimization-problems-faster-20230811/
The original paper:
Provably Faster Gradient Descent via Long Steps
https://arxiv.org/abs/2307.06324
Источник: gonzo-обзоры ML статей
2023-08-11 17:09:59