The authors propose a shared attention mechanism called Dense-and-Implicit Attention (DIA) that can consistently enhance the performance of various neural network backbones, including ResNet, Transformer, and UNet, across tasks such as image classification, object detection, and image generation using diffusion models.
The proposed weight initialization method has key properties of orthogonality, positive entry predominance, and full determinism, enabling effective signal propagation and preventing the "dying ReLU" problem in extremely deep and narrow feedforward neural networks with ReLU activation.
The hierarchical Hopfield network implies a novel generalization of the MLP-Mixer model, called iMixer, which involves MLP layers that propagate forward from the output side to the input side. iMixer is an example of an invertible, implicit and iterative mixing module.
Spectral Neural Operators (SNO) provide a transparent and lossless approach to mapping between function spaces, overcoming the limitations of sampling-based neural operators like Fourier Neural Operator (FNO) and Deep Operator Network (DeepONet).