Image-based Classification Task
Images are an essential component of patents, as they illustrate key aspects of the invention. There are many different types of image in patents, including technical drawings, photos, flow charts, and graphs.
However, even though in many applications it is important to focus an analysis on a specific type of image, the annotation of the images according to the type in patents is in general either non-existant or poor with many errors.
The aim of this task is to automatically classify patent images according to type based on visual content. Manually classified and checked data is provided for training, and the long term aim is, based on these training data, to make it possible to reliably classify the millions of images in patents.
The classification is into 9 classes:
- abstract drawing
- graph
- flow chart
- gene sequence
- program listing
- symbol
- chemical structure
- table
- mathematics
Training data with between 300 and 6,000 training images for each of these classes is provided (see description below). Only these data may be used to train image type classification techniques.
At a later stage, we will publish a test database of 1,000 images. For each of these images, participating groups are required to determine the type of image.
Training data
To obtain the training data, it is necessary to register and fill out the MAREC agreement. See the main CLEF-IP page for information on doing this. Access to the training data is provided once this is done.
The training data, organised into 9 directories - one for each class - contains the following number of training images per class:
Class | Class Number | Abbreviation | # Training Images |
drawing | 1 | ad | 5566 |
chemical structures | 2 | cf | 5958 |
program listing | 3 | cp | 5574 |
gene sequence (dna) | 4 | dn | 5983 |
flow chart | 5 | ff | 311 |
graph | 6 | gr | 1664 |
math | 7 | mf | 5950 |
table | 8 | tb | 5502 |
character (symbol) | 9 | tx | 1579 |
Evaluation
Please note that it is not permitted to use any additional data for training and setup of the systems. If you need test data for system tuning, you need to split the available training data into a training and validation set. We will use equal error rate to evaluate the performance of the individual runs.
We will make the script that we will use for evaluation available soon.
How To Register To CLEF-IP
Follow these steps to register to the Lab.
CLEF-IP Past and Present
- Test collections used in the CLEF-IP campaigns and labs.
- Participants in the CLEF-IP campaigns and labs
- Evaluations: Measures and results
- Documents and Publications
- People involved in CLEF-IP