New paper: Asynchronous SGD with arbitrary delays
My first ever optimization project was an ICML paper about an asynchronous gradient method. At the time, I was quite confused by the fact that no matter what I was doing, Asynchronous gradient descent still converged. Five years later, I can finally give an answer: Because Asynchronous SGD doesn’t care about the delays, which we proved in https://arxiv.org/abs/2206.07638 our new paper. For a short summary, you can read my twitter thread about the paper or check my slides.