The Staircase Phenomenon and its applications in initialization of Neural Network Training Dynamics
数学学科创建110周年系列报告
报告题目(Title):The Staircase Phenomenon and its applications in initialization of Neural Network Training Dynamics
报告人(Speaker):杨将(南方科技大学)
地点(Place):后主楼1124
时间(Time):2025年11月14日(周五)10:00-11:00
邀请人(Inviter):蔡勇勇
报告摘要
Understanding the training dynamics of deep neural networks (DNNs), particularly how they evolve low-dimensional features from high-dimensional data, remains a central challenge in deep learning theory. In this work, we introduce the concept of ε-rank, a novel metric quantifying the effective feature of neuron functions in the terminal hidden layer. Through extensive experiments across diverse tasks, we observe a universal staircase phenomenon: during training process implemented by the standard stochastic gradient descent methods, the decline of the loss function is accompanied by an increase in the ε-rank and exhibits a staircase pattern. Theoretically, we rigorously prove a negative correlation between the loss lower bound and ε-rank, demonstrating that a high ε-rank is essential for significant loss reduction. Moreover, numerical evidences show that within the same deep neural network, the ε-rank of the subsequent hidden layer is higher than that of the previous hidden layer. Based on these observations, to eliminate the staircase phenomenon, we propose a novel pre-training strategy on the initial hidden layer that elevates the ε-rank of the terminal hidden layer. Numerical experiments validate its effectiveness in reducing training time and improving accuracy across various tasks. Therefore, the newly introduced concept of ε-rank is a computable quantity that serves as an intrinsic effective metric characteristic for deep neural networks, providing a novel perspective for understanding the training dynamics of neural networks and offering a theoretical foundation for designing efficient training strategies in practical applications.
主讲人简介
杨将,南方科技大学数学系长聘副教授。2010年获浙江大学学士学位,2014年获香港浸会大学博士学位。2014–2017年先后于宾夕法尼亚州立大学、哥伦比亚大学从事博士后研究,2017年起任职于南方科技大学至今。从事计算数学方向的研究,主要研究兴趣包括关于相场模型和非局部模型的建模、数值方法及应用、深度学习算法设计与理论,研究成果发表在SIAM Review、SIAM Journal on Numerical Analysis、Mathematics of Computation、M3AS、SIAM Journal on Scientific Computing、Journal of Computational Physics等期刊上。曾获东亚工业与应用数学学会学生论文二等奖(2014)、世界华人数学家大会杰出论文奖(2024)、国际基础科学大会“前沿科学奖”(2025)、入选斯坦福-爱思唯尔全球2%顶尖科学家(2025年年度影响力榜单);入选了国家高层次人才计划青年项目、深圳市杰青项目,主持天元数学交叉重点专项1项、国家自然科学基金面上项目2项、广东省自然科学基金项目1项。