Sentence-level Attachment Prediction

M-Dyaa Albakour, Udo Kruschwitz, and Simon Lucas


Abstract


Attachment prediction is the task of automatically identifying email messages that should contain an attachment. This can be useful to tackle the problem of sending out emails but forgetting to include the relevant attachment (something that happens all too often). A common Information Retrieval (IR) approach in analyzing documents such as emails is to treat the entire document as a bag of words. Here we propose a finer-grained analysis to address the problem. We aim at identifying individual sentences within an email that refer to an attachment. If we detect any such sentence, we predict that the email should have an attachment. Using part of the Enron corpus for evaluation we  find that our finer-grained approach outperforms previously reported document-level attachment prediction in similar evaluation settings.

A second contribution this paper makes is to give another successful example of the `wisdom of the crowd' when collecting annotations needed to train the attachment prediction algorithm. The aggregated non-expert judgements collected on Amazon's Mechanical Turk can be used as a substitute for much more costly expert judgements.

 

M-Dyaa Albakour (1,2), Udo Kruschwitz (1), and Simon Lucas (1)

1 School of Computer Science and Electronic Engineering, Language and Computation Group, University of Essex

2 Active Web Solutions Ltd.