TREC-CHEM '10

The TREC-CHEM evaluation campaign aims at creating a reference collection for chemical information retrieval engines, usable to both academics and members of the industry for evaluating their own systems or third party systems that they intend to invest in.

Based on the important progresses made in information retrieval (IR) in terms of theoretical models and evaluations, more and more attention has recently been paid to research in domain specific IR, as evidenced by the organisation of Genomics and Legal tracks in TREC. Now is the right time to carry out large scale evaluations on chemistry datasets in order to promote the research in chemical IR in general and chemical patent IR in particular. Accordingly, we organize a chemical IR track in TREC in order to address the challenges in chemical and patent IR. We will provide a test collection consisting of full-text chemical patents from the IRF and research papers from several publishers (see below). The aim is to identify how current IR methods adapt to text containing chemical names and formulas. Without making it a prerequisite, we encourage participants to use entity identification methods to extract and index chemicals. The evaluation process will be a combination of the pooling/sampling/expert evaluation approach frequently used in TREC and an automatic evaluation method based on references in patent documents.

For most up-to-date information please also visit our wiki. If you are a participant, please remember to register to our mailing list trec-chem@ir-facility.org.


Data collection

The 2010 TREC-CHEM data collection is very similar to the one of 2009, but larger.

Chemical patent documents come from the MAREC collection and include all patent documents classified in category C or A61K of the International Patent Classification codes (IPCs). The total number of documents in this set is approximately 2million.

Scientific articles come from several publishers this year:

 

Tasks

This year's tasks will take on the 2009 tasks, namely the "Prior Art Search" and "Technology Survey Search". Additional tasks may be still decided upon.


Evaluation

We use two types of evaluations: automatic evaluation based on patent citations (Prior Art task) and manual judgements by students and experts (Technical Survey task).

Compared to 2009, this year we aim to have a more domain specific evaluation process in the TS task by introducing specific relevance judgements (e.g. "has compound", "has disease").

 

Organizers

As in 2009, the 2010 TREC-CHEM is a collaboration of the Information Retrieval Facility in Vienna, Austria, University College London, UK, and York University, Canada, and it is supported by the National Institute for Standards and Technology (NIST), USA.

 

Top of page