Mitigating Bias in Large Language Model Evaluation: A Systematic Approach
Existing LLM evaluators suffer from bias towards superficial quality, overlooking instruction following ability. This work proposes systematic methods to mitigate the bias, including online calibration and offline contrastive training, effectively improving the fairness of LLM evaluation.