Understanding the implicit bias of stochastic gradient descent: A dynamical stability perspective
数学专题报告
报告题目(Title): Understanding the implicit bias of stochastic gradient descent: A dynamical stability perspective
报告人(Speaker):吴磊(北京大学)
地点(Place):后主楼 1124
时间(Time):2024 年 04 月 18 日(周四) 10:00--11:00
邀请人(Inviter):蔡永强
报告摘要
In deep learning, models are often over-parameterized, which leads to concerns about algorithms picking solutions that generalize poorly. Fortunately, stochastic gradient descent (SGD) always converges to solutions that generalize well even without needing any explicit regularization, suggesting certain “implicit regularization” at work. This talk will provide an explanation of this striking phenomenon from a stability perspective. Specifically, we show that a stable minimum of SGD must be flat, as measured by various norms of local Hessian. Furthermore, these flat minima provably generalize well for two-layer neural networks and diagonal linear networks. As opposed to popular continuous-time analysis, our stability analysis respects the discrete nature of SGD and can explain the effect of finite learning rates, batch size, and why SGD often generalizes better than GD.
主讲人简介
吴磊2021年入职北京大学数学学院,任助理教授;研究方向为深度学习理论,特别是理解神经网络的逼近能力和SGD算法的隐式正则化。他于2012年获南开大学数学与应用数学专业学士学位,2018年毕业于北京大学计算数学专业,2018年11月至2021年10月先后在美国普林斯顿大学和宾夕法尼亚大学从事博士后研究。