Overview
The first IRF Scientific Conference was held on 31 May 2010 in Vienna with the participation of about 70 academics and information professionals, fulfilling the vision of a multidisciplinary scientific forum. The presentations dealt with information retrieval but from different perspectives, such as natural language processing, evaluation measures, and alternative models. Many of the papers considered applications in the challenging area of patent retrieval, with use of diverse approaches including logic-based retrieval, conditional random fields and probabilistic retrieval models. All accepted papers and posters were published by Springer in the Proceedings of IRFC 2010, as part of the Lecture Notes in Computer Science series.
Prof. Mark Sanderson (University of Sheffield) opened the conference with a brief outline of the history of search evaluation, before tackling the problem of designing test collections. He described some pioneering but relatively overlooked research that pointed out that the key problem for researchers is not the question of how to measure searching systems accurately, the problem is how to accurately measure people.
Neil Newbold from the University of Surrey presented a new approach to ranking that considers the reading ability (and motivation) of the user. His team investigated using readability to re-rank web pages. Results to date suggest that considering a view of readability for each reader may increase the probability of relevance to a particular user.
Erik Graf from the University of Glasgow explored the benefits of integrating knowledge representations in prior art patent retrieval. Key to the introduced approach is the utilization of human judgment available in the form of classifications assigned to patent documents. In general the proposed knowledge expansion techniques are particularly beneficial to recall and result in significant precision gains.
As a low-cost resource that is up-to-date, Wikipedia recently gained attention as a means to provide cross-language bridging for information retrieval. Benjamin Roth from the Saarland University showed that standard Latent Dirichlet Allocation (LDA) can extract cross-language information that is valuable for IR by simply normalizing the training data. Furthermore, his team showed that the combination of LDA and Explicit Semantic Analysis (ESA) yield significant improvements.
Jay Urbain from the Milwaukee School of Engineering explored the development of probabilistic retrieval models for integrating term statistics with entity search using multiple levels of document context to improve the performance of chemical patent search. His team reports better results than those achieved at the 2009 TREC Chemistry track.
A typical evaluation of a retrieval system involves computing an effectiveness metric, e.g. average precision, for each topic of a test collection and then using the average of the metric, e.g. mean average precision, to express the overall effectiveness. However, averages do not capture all the important aspects of effectiveness. Mehdi Hosseini from the University College London explored how the variance of a metric can be used as a measure of variability.
David Hawking, Chief Scientist at Funnelback Internet and Enterprise Search in Australia, closed the conference with a practical and commercial perspective on a half century of electronic information retrieval. He concluded that tools must be created to help users better cope with the limitations of systems, for example spelling suggestion tools and query expansion tools.
Hamish Cunningham, from the University of Sheffield and Stefan Rueger from The Open University respectively General Chair and Programme Chair, were extremely satisfied with the outcome of this first IRF Scientific Conference, both in terms of participants and quality of papers - 11 accepted papers out of 20 submissions.