Rank By Readability: Document Weighting for Information Retrieval

Neil Newbold, Harry McLaughlin and Lee Gillam (Department of Computing, University of Surrey, UK)


Abstract
 

In this paper, we present a new approach to ranking that considers the reading ability (and motivation) of the user. Web pages can be, increasingly, badly written with unfamiliar words, poor use of syntax, ambiguous phrases and so on. Readability research suggests that experts and motivated readers may overcome confusingly written text, but nevertheless find it an irritation. We investigate using readability to re-rank web pages. We take an extended view of readability that considers the reading level of retrieved web pages using techniques that consider both textual and cognitive factors. Readability of a selection of query results is examined, and a re-ranking on readability is compared to the original ranking. Results to date suggest that considering a view of readability for each reader may increase the probability of relevance to a particular user.
 

Department of Computing, University of Surrey