Late-interaction models rely on token embeddings for relevance, requiring efficient pruning methods to reduce storage overhead.