Konstantin Mishchenko

Konstantin Mishchenko

Research Scientist

Samsung

Bio

I’m a research scientist at Samsung AI Center in Cambridge, UK. Before joining Samsung, I did a postdoc at Inria Sierra with Alexandre d’Aspremont and Francis Bach. I received my PhD in computer science from KAUST, where I worked under the supervision of Peter Richtárik on optimization theory and its applications in machine learning. In 2020, I interned at Google Brain hosted by Nicolas Le Roux and Courtney Paquette. Prior to that, I obtained my double degree MSc diploma from École Normale Supérieure Paris-Saclay and Paris-Dauphine, and a BSc from Moscow Institute of Physics and Technology.

My hobbies include ultimate frisbee, squash and bouldering.

Note that I’m currently not taking any interns.

Interests
  • Optimization
  • Deep learning
  • Federated and distributed learning
Education
  • PhD in Computer Science, 2021

    KAUST

  • MSc in Data Science, 2017

    École normale supérieure Paris-Saclay and Paris-Dauphine

  • BSc in Computer Science and Physics, 2016

    Moscow Institute of Physics and Technology

Experience

 
 
 
 
 
Samsung
Research Scientist
Samsung
Jan 2023 – Present Cambridge, UK
Working on federated learning and embedded AI systems as a member of the Distributed AI team.
 
 
 
 
 
Inria Sierra
Postdoc
Dec 2021 – Dec 2022 Paris, France
Conducted research on adaptive, second-order, and distributed optimization.

Recent Papers

Quickly discover relevant content by filtering publications.
(2023). Learning-Rate-Free Learning by D-Adaptation.

PDF Cite arXiv

(2023). Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes.

PDF Cite arXiv

(2022). Super-Universal Regularized Newton Method.

PDF Cite Slides arXiv

(2022). Adaptive Learning Rates for Faster Stochastic Gradient Methods.

PDF Cite arXiv

(2022). Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays.

PDF Cite Slides arXiv

(2022). ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!. ICML.

PDF Cite Video arXiv ICML

(2022). Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization.

PDF Cite arXiv

(2021). IntSGD: Adaptive Floatless Compression of Stochastic Gradients. ICLR.

PDF Cite Code Poster Slides arXiv ICLR

(2021). Proximal and Federated Random Reshuffling. ICML.

PDF Cite Code Slides Video arXiv ICML

(2020). Random Reshuffling: Simple Analysis with Vast Improvements. NeurIPS.

PDF Cite Code Poster Slides arXiv NeurIPS

(2020). Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms. JOTA.

PDF Cite Poster arXiv JOTA

(2019). Adaptive Gradient Descent without Descent. ICML.

PDF Cite Code Poster Slides arXiv ICML Video

(2019). First Analysis of Local GD on Heterogeneous Data.

PDF Cite Slides arXiv NeurIPS

(2019). Tighter Theory for Local SGD on Identical and Heterogeneous Data. AISTATS.

PDF Cite Slides arXiv AISTATS

(2019). MISO is Making a Comeback With Better Proofs and Rates.

PDF Cite arXiv

(2019). DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate. AISTATS.

PDF Cite arXiv AISTATS

(2019). Revisiting Stochastic Extragradient. AISTATS.

PDF Cite Slides arXiv AISTATS

(2019). Stochastic Distributed Learning with Gradient Quantization and Double Variance Reduction. Optimization Methods and Software.

PDF Cite arXiv

(2019). 99% of Worker-Master Communication in Distributed Optimization Is Not Needed. UAI.

PDF Cite arXiv UAI

(2019). Distributed Learning with Compressed Gradient Differences.

PDF Cite arXiv

(2018). SEGA: Variance Reduction via Gradient Sketching. NeurIPS.

PDF Cite arXiv NIPS

(2018). A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning. ICML.

PDF Cite ICML

(2018). A Distributed Flexible Delay-tolerant Proximal Gradient Algorithm. SIOPT.

PDF Cite arXiv SIAM