Evaluating Ranking Changes in Live Professional Legal Search Systems: Challenges and Limitations of Common Approaches
Common ranking evaluation methods, including test collections, user surveys, and A/B testing, are suboptimal for evaluating changes to ranking algorithms in live professional legal search systems due to characteristics of the legal domain, such as high recall requirements, limited user data, and commercial constraints.