Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment
The author proposes Training-Free Optimization of Codebook (TOC) to enhance model performance by selecting important channels without retraining. Additionally, the Hierarchical Dual Cross-modal Information Disentanglement (H-DCID) approach extends information separation and alignment to two levels for improved cross-modal learning.