Konsep Inti
Masked AutoDecoder (MAD) is an effective multi-task vision generalist that employs parallel decoding and masked sequence modeling for efficient and accurate performance across various vision tasks.
Statistik
Autoregressive Transformers may not fit well with vision tasks due to differences in sequential dependencies.
MAD achieves approximately 100× acceleration in inference time compared to autoregressive counterparts.