Transformer-based Causal Language Models Perform Clustering: Unveiling Hidden Mechanisms and Inductive Biases
Transformer-based CLMs encode task-specific information through clustering in their hidden space, aiding in instruction-following capabilities.