I’m a research scientist at Samsung AI Center in Cambridge, UK. I like using mathematics to make things work in practice, especially in deep learning applications. Beside doing research, I serve as an Action Editor for TMLR, tweet about interesting papers, and give talks about my research. In 2023, I was lucky to receive the Outsdanding Paper Award together with Aaron Defazio for our work on adaptive methods.

Before joining Samsung, I did a postdoc at Inria Sierra with Alexandre d’Aspremont and Francis Bach. I received my PhD in computer science from KAUST, where I worked under the supervision of Peter Richtárik on optimization theory and its applications in machine learning. In 2020, I interned at Google Brain hosted by Nicolas Le Roux and Courtney Paquette. Prior to that, I obtained my double degree MSc diploma from École Normale Supérieure Paris-Saclay and Paris-Dauphine, and a BSc from Moscow Institute of Physics and Technology.

Note that I’m currently not taking any interns.

Interests

- Optimization
- Deep learning
- Federated and distributed learning

Education

PhD in Computer Science, 2021

KAUST

MSc in Data Science, 2017

École normale supérieure Paris-Saclay and Paris-Dauphine

BSc in Computer Science and Physics, 2016

Moscow Institute of Physics and Technology

Research Scientist

Samsung

Working on federated learning and embedded AI systems as a member of the Distributed AI team.

Postdoc

Conducted research on adaptive, second-order, and distributed optimization.

Our work on an extension of DoG with weighted gradients got accepted for presentation at NeurIPS this year! If you want to try our method, a pytorch implementation is available on github. I hope to see more papers building upon DoG, DoWG, D-Adaptation, and Prodigy, we have barely scratched the surface on what can be done, and some of these methods are already being used in practice.

I’m delighted to share that Aaron Defazio and I received the ICML Outstanding Paper Award for our work on D-Adaptation. The associated github repository of our paper has been quite popular and we are working hard on making extensions that will make adaptive methods even more useful for deep learning. Our first extension, Prodigy, is available on github as well and has been performing even better than D-Adaptation in our experiments. Expect more updates from us pretty soon!

Quickly discover relevant content by filtering publications.

DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate.
AISTATS.

(2019).
(2019).
(2019).
Stochastic Distributed Learning with Gradient Quantization and Double Variance Reduction.
Optimization Methods and Software.

(2019).
(2019).
(2019).
(2018).
(2018).
(2018).
(2018).
- konsta.mish @ gmail com
- Cambridge, CB1 2FG
- My Twitter