北师大数学学院

您所在的位置：首页 » 学术报告

The Staircase Phenomenon and its applications in initialization of Neural Network Training Dynamics

数学学科创建110周年系列报告

报告题目(Title)：The Staircase Phenomenon and its applications in initialization of Neural Network Training Dynamics

报告人(Speaker)：杨将（南方科技大学）

地点(Place)：后主楼1124

时间(Time)：2025年11月14日（周五）10:00-11:00

邀请人(Inviter)：蔡勇勇

报告摘要

Understanding the training dynamics of deep neural networks (DNNs), particularly how they evolve low-dimensional features from high-dimensional data, remains a central challenge in deep learning theory. In this work, we introduce the concept of ε-rank, a novel metric quantifying the effective feature of neuron functions in the terminal hidden layer. Through extensive experiments across diverse tasks, we observe a universal staircase phenomenon: during training process implemented by the standard stochastic gradient descent methods, the decline of the loss function is accompanied by an increase in the ε-rank and exhibits a staircase pattern. Theoretically, we rigorously prove a negative correlation between the loss lower bound and ε-rank, demonstrating that a high ε-rank is essential for significant loss reduction. Moreover, numerical evidences show that within the same deep neural network, the ε-rank of the subsequent hidden layer is higher than that of the previous hidden layer. Based on these observations, to eliminate the staircase phenomenon, we propose a novel pre-training strategy on the initial hidden layer that elevates the ε-rank of the terminal hidden layer. Numerical experiments validate its effectiveness in reducing training time and improving accuracy across various tasks. Therefore, the newly introduced concept of ε-rank is a computable quantity that serves as an intrinsic effective metric characteristic for deep neural networks, providing a novel perspective for understanding the training dynamics of neural networks and offering a theoretical foundation for designing efficient training strategies in practical applications.

主讲人简介

杨将，南方科技大学数学系长聘副教授。2010年获浙江大学学士学位，2014年获香港浸会大学博士学位。2014–2017年先后于宾夕法尼亚州立大学、哥伦比亚大学从事博士后研究，2017年起任职于南方科技大学至今。从事计算数学方向的研究，主要研究兴趣包括关于相场模型和非局部模型的建模、数值方法及应用、深度学习算法设计与理论，研究成果发表在SIAM Review、SIAM Journal on Numerical Analysis、Mathematics of Computation、M3AS、SIAM Journal on Scientific Computing、Journal of Computational Physics等期刊上。曾获东亚工业与应用数学学会学生论文二等奖（2014）、世界华人数学家大会杰出论文奖（2024）、国际基础科学大会“前沿科学奖”（2025）、入选斯坦福-爱思唯尔全球2%顶尖科学家（2025年年度影响力榜单）；入选了国家高层次人才计划青年项目、深圳市杰青项目，主持天元数学交叉重点专项1项、国家自然科学基金面上项目2项、广东省自然科学基金项目1项。

Colloquiums 学科创建110周年报告-京师数学公众报告