CHAI proposes Clustered Head Attention to reduce memory and compute requirements in Large Language Models by identifying redundant attention heads.