This USAGE corpus consists of annotations of Amazon reviews for different product categories in the languages German and English. The reviews themselves are not part of this data publication. The annotations are fine-grained, including aspects and subjective phrases. In addition, the relation of an aspect to be a target of a subjective phrase is provided as well as the polarity of the subjective phrase. The corpus consists of 622 English and 611 German reviews for coffee machines, cutlery, microwaves, toaster, trash cans, vacuum cleaner, washing machines and dishwasher. The English corpus is annotated with more than 8000 aspects and 5000 subjective phrases, the German part with more than 6000 aspects and around 5000 subjective phrases (depending on the annotator). Each review is independently annotated by two annotators.
For detailed information, read the LREC 2014 paper.
The data is available under the Open Data Commons Attribute License (ODB-By) v1.0.
- USAGE Corpus v1.0.2: Corrected typo in readme
- USAGE Corpus v1.0.1: Added licence information to readme
- USAGE Corpus v1.0.0: Original distribution
For copyright reasons we cannot publish the Amazon reviews themselves. Therefore the tarball contains tools to receive the corpus from the original websites. Please do not redistribute these reviews. There is also a version available which includes text, but this cannot be made publicly available.
Please also note our machine translation quality corpus, which consists of the German sentences in this corpus, automatically translated to English.
If you are not interested in the full annotations, the phrases extracted from the German and English corpora with polarity annotations might be an interesting resource for you:
IGGSA Shared Task Test Corpus
You can also download it here:
- IGGSA Shared Task Test Corpus v1.0.0: Original distribution