Improving Generalization Ability of Deep Wide Residual Network with Suitable Scaling Factor
The author argues that selecting a suitable scaling factor on the residual branch of deep wide ResNets is crucial for achieving good generalization ability, supported by theoretical and empirical evidence.