Dataset
The MAREC 400.000 collection consists of 100.000 randomly picked patents from each sub-collection of  the MAREC dataset (EPO, JPO, USPTO, WIPO). It was targeted at people  submitting papers to the AsPIRe'10 workshop at the ECIR.  
Participants were encouraged to apply the techniques they develop to  this dataset, where possible. This allows the results of the  presented techniques applied to the same dataset to be more easily  comparable. Furthermore, the MAREC 400.000 collection allows initial  patent processing experiments to be done on a representative dataset of  a reasonable size, before scaling these up to the 19 million patents of  the MAREC collection.
How to access MAREC
If you are interested in accessing the MAREC data please contact membership@ir-facility.org.MAREC at a glance
- 19 million XML documents
- ALL patent applications and granted patents between 1976 and June 2008
- From EPO, WIPO, USPTO, JPO
- Unified fields, numbering scheme and citation format
- Comparable corpus
- Statistics