Modern machine learning systems are trained on massive amounts of data. Without special care, machine learning models are prone to regurgitating or otherwise revealing information about individual data points. I will discuss differential privacy, a rigorous notion of data privacy, and how it can be used to provably protect against such inadvertent data disclosures by machine learning models. I will focus primarily on recent approaches that employ public data for pre-training the model, granting significant improvements in the utility of privately fine-tuned models. I will also discuss pitfalls of these methods and potential paths forward for the field.
Biography:
Gautam Kamath is an Assistant Professor at the David R. Cheriton School of Computer Science at the University of Waterloo, and a Canada CIFAR AI Chair and Faculty Member at the Vector Institute. He has a B.S. in Computer Science and Electrical and Computer Engineering from Cornell University, and an M.S. and Ph.D. in Computer Science from the Massachusetts Institute of Technology. He is interested in reliable and trustworthy statistics and machine learning, including considerations such as data privacy and robustness. He was a Microsoft Research Fellow, as a part of the Simons-Berkeley Research Fellowship Program at the Simons Institute for the Theory of Computing. He serves as an Editor in Chief of Transactions on Machine Learning Research, and is the program committee co-chair of the 36th International Conference on Algorithmic Learning Theory (ALT 2025). He is the recipient of several awards, including the Caspar Bowden Award for Outstanding Research in Privacy Enhancing Technologies, a best paper award at the Forty-first International Conference on Machine Learning (ICML 2024), and the Faculty of Math Golden Jubilee Research Excellence Award.