Concepts de base
Diffusion models with different initializations or architectures can produce remarkably similar outputs when given the same noise inputs, a rare property in other generative models.
Résumé
The content discusses the consistency phenomenon observed in diffusion models (DMs), where trained DMs with different initializations or even different architectures can generate remarkably similar outputs when given the same noise inputs. This is a rare property that is not commonly seen in other generative models.
The authors attribute this consistency phenomenon to two key factors:
- The learning difficulty of DMs is lower when the noise-prediction diffusion model approaches the upper limit of the timestep (the input becomes pure noise), where the structural information of the output is usually generated.
- The loss landscape of DMs is highly smooth, which implies that the model tends to converge to similar local minima and exhibit similar behavior patterns.
This finding not only reveals the stability of DMs, but also inspires the authors to devise two strategies to accelerate the training of DMs:
- A curriculum learning based timestep schedule (CLTS), which leverages the noise rate as an explicit indicator of the learning difficulty and gradually reduces the training frequency of easier timesteps, thus improving the training efficiency.
- A momentum decay with learning rate compensation (MDLRC) strategy, which reduces the momentum coefficient during the optimization process, as the large momentum may hinder the convergence speed and cause oscillations due to the smoothness of the loss landscape.
The authors demonstrate the effectiveness of their proposed strategies on various models and show that they can significantly reduce the training time and improve the quality of the generated images.
Stats
The paper does not provide any specific numerical data or metrics to support the key claims. The analysis is primarily based on qualitative observations and visualizations.
Citations
"Despite different initializations or structural variations, DMs trained on the same dataset produce remarkably consistent results when exposed to identical noise during sampling."
"The learning difficulty of DMs can be explicitly indicated by the noise ratio, that is, for noise-prediction DMs, the higher the noise, the easier to learn, which aligns well with the principle of curriculum learning that advocates learning from easy to hard."
"Unlike GANs [5], which require a large momentum to ensure gradient stability, DMs can benefit from a smaller momentum. Our experimental results show that a large momentum may hinder the convergence speed and cause oscillations of DMs."