Measuring the Variability in Effectiveness of a Retrieval System

Mehdi Hosseini, Ingemar J. Cox (University College London, UK), Natasa Millic-Frayling, and Vishwa Vinay (Microsoft Research Cambridge, UK)

 

Abstract
 

A typical evaluation of a retrieval system involves computing an effectiveness metric, e.g. average precision, for each topic of a test collection and then using the average of the metric, e.g. mean average precision, to express the overall effectiveness. However, averages do not capture all the important aspects of effectiveness and, used alone, may not be an informative measure of systems' effectiveness. Indeed, in addition to the average, we need to consider the variation of effectiveness across topics. We refer to this variation as the variability in effectiveness. In this paper we explore how the variance of a metric can be used as a measure of variability. We define a variability metric, and illustrate how the metric can be used in practice.

 

Department of Computer Science, University College London

Microsoft Research Cambridge