Unbiased Learning to Rank Fails to Improve Real-World Search Performance Despite Capturing Click Biases
Unbiased learning-to-rank (ULTR) techniques do not bring clear performance improvements on the large-scale Baidu-ULTR dataset, especially compared to the stark differences brought by the choice of ranking loss and query-document features. While ULTR methods robustly improve click prediction, these gains do not translate to enhanced ranking performance on expert relevance annotations.