Konstantin Mishchenko

Research Scientist

Bio

Hi there, I’m Konstantin, an AI researcher at Meta and a wannabe musician. I like using mathematics to make things work in practice, especially in deep learning applications. Previously I was a research scientist at Samsung AI Center in Cambridge, UK. Beside doing research, I serve as an Action Editor for TMLR, tweet about interesting papers, and give talks about my studies. In 2023, I was lucky to receive the Outstanding Paper Award together with Aaron Defazio for our work on adaptive methods.

Prior to my work in industry, I did a postdoc at Inria Sierra with Alexandre d’Aspremont and Francis Bach. I received my PhD from KAUST, where I worked under the supervision of Peter Richtárik on optimization theory and its applications in machine learning. In 2020, I also interned at Google Brain. I obtained my double degree MSc diploma from École Normale Supérieure Paris-Saclay and Paris-Dauphine, and a BSc from Moscow Institute of Physics and Technology.

My interests and hobbies tend to change every couple of years or so. Recently, I finished 6 months of evening classes at The Institute of Contemporary Music Performance where I studied electronic music production using Ableton Live. I hope to release some music online in the future.

Feel free to shoot me an email if you want to chat in person about research or music, go to a museum, or maybe just take a walk in Paris!

Interests

Generative AI
Optimization
Deep learning

Education

PhD in Computer Science, 2021
KAUST
MSc in Data Science, 2017
École normale supérieure Paris-Saclay and Paris-Dauphine
BSc in Computer Science and Physics, 2016
Moscow Institute of Physics and Technology

Experience

Research Scientist

Recent Posts

Paper accepted at TMLR

Our work with Rustem Islamov, Eduard Gorbunov, and Samuel Horváth got accepted for publication at TMLR!

Last updated on Feb 5, 2025 1 min read

Talk at Criteo in Paris

Today I gave a talk at the Criteo office in Paris on adaptive optimizers and schedulee-free methods. I also talked about directions in optimization that I personally find worth exploring.

Last updated on Feb 5, 2025 1 min read

New job at Meta

I’m excited to announce that I started my new job at Meta as a Research Scientist on the CodeGen team in Paris, France led by Gabriel Synnaeve.

Code generation using ML got me very excited because I was always frustrated by the amount of time it was taking me to translate my ideas into code, and it became much easier in the last couple of years. I think John Carmack said in an interview that in game development, most of code written is never read by anyone because there is just too much of it. I like to imagine systems like that in which the code is generated on the fly for different use cases, optimized, tested and debugged without us directly seeing any of that. More than anything, I just want a tool that would make programming more about designing elegant systems and solutions than actually writing or debugging them.

I am particularly excited to join Meta given their strong commitment to open source AI, which I believe is crucial for ensuring democratic access to this technology. I will keep writing and publishing papers as well as releasing my code.

Last updated on Oct 24, 2024 1 min read

Talk at Imperial College London

I was invited by Panos Parpas to deliver a talk at Imperial College London, which I had the pleasure to do today. I talked about my perspective on the optimization theory and how it can be used to approach deep learning. While training deep networks is defined by nonconvex non-smooth losses, which are nearly impossible to minimize in general, the practical performance of optimization methods is nowhere as pessimistic. At the same time, smooth optimization framework seems to suggest that methods like SVRG, which do not work well in deep learning, are the best we can use. The purpose of my talk was, therefore, to outline directions where I believe useful theory can be developed and identify some promising ways of bridging theory with practice.

Last updated on Oct 9, 2024 1 min read

Leaving Samsung

Today was my last day working as a research scientist at Samsung AI Center in Cambridge, UK. It was a pleasure to be there, we wrote a few papers and one patent application (currently under review) and I had a lot of fun collaborating with people at the office. However, I want to explore new directions and I no longer want to stay in the UK. I will announce my next steps later and I’m taking a break for now to work a little bit on composin electronic music in Ableton Live, which is something I’ve been really enjoying in the last 6 months.

Last updated on Oct 9, 2024 1 min read

See all posts

Recent Papers

Quickly discover relevant content by filtering publications.

Hao Mark Chen, Wayne Luk, Ka Fai Cedric Yiu, Rui Li, Konstantin Mishchenko, Stylianos I. Venieris, Hongxiang Fan (2024). Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference.

PDF Cite Slides arXiv

Aaron Defazio, Xingyu Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky (2024). The Road Less Scheduled.

PDF Cite arXiv

Aaron Defazio, Ashok Cutkosky, Harsh Mehta, Konstantin Mishchenko (2023). When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement.

PDF Cite arXiv

Yura Malitsky, Konstantin Mishchenko (2023). Adaptive Proximal Gradient Method for Convex Optimization.

PDF Cite arXiv

Konstantin Mishchenko, Aaron Defazio (2023). Prodigy: An Expeditiously Adaptive Parameter-Free Learner.

PDF Cite Code Poster Slides Video arXiv

Konstantin Mishchenko, Rustem Islamov, Eduard Gorbunov, Samuel Horváth (2023). Partially Personalized Federated Learning: Breaking the Curse of Data Heterogeneity.

PDF Cite arXiv

Ahmed Khaled, Konstantin Mishchenko, Chi Jin (2023). DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method.

PDF Cite Code Slides arXiv

Blake Woodworth, Konstantin Mishchenko, Francis Bach (2023). Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy.

PDF Cite arXiv ICML

Aaron Defazio, Konstantin Mishchenko (2023). Learning-Rate-Free Learning by D-Adaptation.

PDF Cite Code arXiv ICML

Konstantin Mishchenko, Slavomír Hanzely, Peter Richtárik (2023). Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes.

PDF Cite arXiv

Nikita Doikov, Konstantin Mishchenko, Yurii Nesterov (2022). Super-Universal Regularized Newton Method.

PDF Cite Code Slides arXiv

Samuel Horváth, Konstantin Mishchenko, Peter Richtárik (2022). Adaptive Learning Rates for Faster Stochastic Gradient Methods.

PDF Cite arXiv

Konstantin Mishchenko, Francis Bach, Mathieu Even, Blake Woodworth (2022). Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays.

PDF Cite Code Slides arXiv

Konstantin Mishchenko, Grigory Malinovsky, Sebastian Stich, Peter Richtárik (2022). ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!. ICML.

PDF Cite Code Video arXiv ICML

Grigory Malinovsky, Konstantin Mishchenko, Peter Richtárik (2022). Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization.

PDF Cite arXiv

Konstantin Mishchenko (2021). Regularized Newton Method with Global

O (1 / k^{2})

Convergence.

PDF Cite Code Slides Video arXiv

Konstantin Mishchenko, Bokun Wang, Dmitry Kovalev, Peter Richtárik (2021). IntSGD: Adaptive Floatless Compression of Stochastic Gradients. ICLR.

PDF Cite Code Poster Slides arXiv ICLR

Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik (2021). Proximal and Federated Random Reshuffling. ICML.

PDF Cite Code Slides Video arXiv ICML

Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik (2020). Random Reshuffling: Simple Analysis with Vast Improvements. NeurIPS.

PDF Cite Code Poster Slides arXiv NeurIPS

Adil Salim, Laurent Condat, Konstantin Mishchenko, Peter Richtárik (2020). Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms. JOTA.

PDF Cite Poster arXiv JOTA

Dmitry Kovalev, Konstantin Mishchenko, Peter Richtárik (2019). Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates.

PDF Cite Poster arXiv NeurIPS

Yura Malitsky, Konstantin Mishchenko (2019). Adaptive Gradient Descent without Descent. ICML.

PDF Cite Code Poster Slides arXiv ICML Video

Konstantin Mishchenko (2019). Sinkhorn Algorithm as a Special Case of Stochastic Mirror Descent.

PDF Cite Poster Slides arXiv NeurIPS

Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik (2019). First Analysis of Local GD on Heterogeneous Data.

PDF Cite Slides arXiv NeurIPS

Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik (2019). Tighter Theory for Local SGD on Identical and Heterogeneous Data. AISTATS.

PDF Cite Slides arXiv AISTATS

Konstantin Mishchenko, Mallory Montgomery, Federico Vaggi (2019). A Self-supervised Approach to Hierarchical Forecasting with Applications to Groupwise Synthetic Controls.

PDF Cite arXiv ICML

Xun Qian, Alibek Sailanbayev, Konstantin Mishchenko, Peter Richtárik (2019). MISO is Making a Comeback With Better Proofs and Rates.

PDF Cite arXiv

Saeed Soori, Konstantin Mishchenko, Aryan Mokhtari, Maryam Mehri Dehnavi, Mert Gürbüzbalaban (2019). DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate. AISTATS.

PDF Cite arXiv AISTATS

Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin, Peter Richtárik, Yura Malitsky (2019). Revisiting Stochastic Extragradient. AISTATS.

PDF Cite Slides arXiv AISTATS

Konstantin Mishchenko, Peter Richtárik (2019). A Stochastic Decoupling Method for Minimizing the Sum of Smooth and Non-Smooth Functions.

PDF Cite Slides arXiv

Samuel Horváth, Dmitry Kovalev, Konstantin Mishchenko, Sebastian Stich, Peter Richtárik (2019). Stochastic Distributed Learning with Gradient Quantization and Double Variance Reduction. Optimization Methods and Software.

PDF Cite arXiv

Konstantin Mishchenko, Filip Hanzely, Peter Richtárik (2019). 99% of Worker-Master Communication in Distributed Optimization Is Not Needed. UAI.

PDF Cite arXiv UAI

Konstantin Mishchenko, Eduard Gorbunov, Martin Takáč, Peter Richtárik (2019). Distributed Learning with Compressed Gradient Differences.

PDF Cite arXiv

Konstantin Mishchenko, Peter Richtárik (2018). A Stochastic Penalty Model for Convex and Nonconvex Optimization with Big Constraints.

PDF Cite Poster arXiv

Filip Hanzely, Konstantin Mishchenko, Peter Richtárik (2018). SEGA: Variance Reduction via Gradient Sketching. NeurIPS.

PDF Cite arXiv NIPS

Konstantin Mishchenko, Franck Iutzeler, Jérôme Malick, Massih-Reza Amini (2018). A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning. ICML.

PDF Cite ICML

Konstantin Mishchenko, Franck Iutzeler, Jérôme Malick (2018). A Distributed Flexible Delay-tolerant Proximal Gradient Algorithm. SIOPT.

PDF Cite arXiv SIAM

Konstantin Mishchenko

Research Scientist

Meta

Bio

Experience

Recent Posts

Recent Papers

Contact