GATE Teamware
Overview
Teamware is a software suite and a methodology for the implementation and support of annotation factories. It is intended to provide a framework for commercial annotation services, supplied either as in-house units or as outsourced specialist activities.
There is a number of different types of personnel resources required for an effective annotation factory including
- Language engineers, who are skilled staff with knowledge of both computational linguistics and computer science
- Information curators, who may be corporate librarians, systems administrators or data curators, and who might be expected to spend several weeks in training
- Annotators, who are largely unskilled, may be geographically distributed, and whose work is quality controlled via automated voting and metrics-related mechanisms (Amazon's Mechanical Turk web service is one way to marshal annotator labour).
Teamware defines the support tools for these different roles, and the workflow by which they may combine with automatic information extraction systems to provide cost-effective annotation services.
GATE Teamware is a web-based application using JAVA Web Start to deliver functional components to the desktop. A web-based management interface allows for project set-up, including:
- Choosing or uploading a document collection (a "corpus")
- Choosing or uploading a schema to constrain manual annotations
- Choosing or setting up re-usable project templates
- Applying pre-processing (automatic annotations) to a corpus
- Selecting project participants and assigning projects to specific users
- Monitoring progress and various project statistics
The University of Sheffield created a development-level installation of Teamware for the IRF on the IRF's Large Data Collider (LDC) which is used and reviewed by IRF members. If you are interested in using GATE Teamware but are not an IRF member yet, you can apply for membership here.
Project Partners
University of Sheffield, UK, the Natural Language Processing Group
Links
Semantic Annotation
Annotation is about attaching attributes to a document or to a selected part in a text. It provides additional information about an existing piece of data. Semantic Annotation goes one level deeper: It enriches the unstructured or semi-structured data with a context that is further linked to the structured knowledge of a domain and it allows results that are not explicitly related to the original search.