Novel vision-language model EmoCLIP enhances zero-shot video facial expression recognition performance significantly.