…highlights are subsets of the input elements (either words, subwords, or full sentences) that are compact and sufficient to explain a prediction.
English
Dataset |
Task |
Granurality Restriction |
Collection Method |
# Instances |
# Explanations per Instance |
Total # Annotators |
MovieReviews |
sentiment classification |
none |
author |
1800 |
1 |
1 |
MovieReviewsc |
sentiment classification |
none |
crowd |
200 |
2 (1 given) |
2 |
SST |
sentiment classifcation |
none |
crowd |
11,855 |
3 (1 given) |
n/a |
WikiQA |
open-domain QA |
sentence |
crowd + authors |
1,473 |
|
|
WikiAttack |
detecting personal attacks |
none |
students |
1089 |
4 (1 given) |
40 |
e-SNLI |
natural language inference |
none |
crowd |
~569K |
1 or 3 |
6325 |
MultiRC |
reading comprehension QA |
sentences |
crowd |
5,825 |
1 |
n/a |
FEVER |
verifying claims from text |
sentences |
crowd |
~136K |
1 |
50 |
HotpotQA |
reading comprehension QA |
sentences |
crowd |
112,779 |
n/a |
n/a |
Hanselowski et al. (2019) |
verifying claims from text |
sentences |
crowd |
6,422 |
varies |
n/a |
NaturalQuestions |
reading comprehension QA |
1 paragraph |
crowd |
n/a |
1 or 5 |
50 |
CoQA |
conversational QA |
none |
crowd |
~127K |
1 or 3 |
n/a |
CoS-E v1.0 |
commonsense QA |
none |
crowd |
8,560 |
1 |
n/a |
CoS-E v1.1 |
comonsense QA |
none |
crowd |
10,962 |
1 |
n/a |
BoolQc |
reading comprehension QA |
none |
crowd |
199 |
3 (1 given) |
3 |
EvidenceInference v1.0 |
evidence inference |
none |
experts |
10,137 |
1 |
3 |
EvidenceInference v1.0c |
evidence inference |
none |
experts |
125 |
1 |
4 |
EvidenceInference v2.0 |
evidence inference |
none |
experts |
2,503 |
1 |
6 |
SciFact |
verifying claims from text |
1-3 sentences |
experts |
995 |
1-3 |
13 |
Kutlu et al. (2020) |
webpage relevance ranking |
2-3 sentences |
crowd |
700 |
15 |
n/a |
ECtHR |
alleged legal violation prediction |
paragraphs |
auto + expert |
~11K |
1 |
n/a |
Hummingbird |
Style classification |
words |
crowd |
500 |
1 |
622 (3 annotators/sentence) |
HateXplain |
hate-speech classification |
phrases |
crowd |
20148 |
3 |
253 |
ContractNLI |
natural language inference |
sentence or a list item within a sentence |
expert + crowd |
607 |
1 |
3 |
Indian Legal Documents Corpus (ILDC) |
predicting the outcome of a legal case |
sentence |
experts |
56 |
1 |
5 |
Chinese
Dataset |
Task |
Granurality Restriction |
Collection Method |
# Instances |
# Explanations per Instance |
Total # Annotators |
GCRC |
reading comprehension |
sentence |
students |
1725 questions |
1 (and rarely more) |
12 |
Multiple Languages
Dataset |
Task |
Granurality Restriction |
Collection Method |
# Instances |
# Explanations per Instance |
Total # Annotators |
Languages |
X-QE |
Quality Estimation of Machine Translation |
words |
experts |
4590 |
1 |
14 |
Estonian-English (Et-En), Romanian-English (Ro-En), German-Chinese (De-Zh), Russian-German (Ru-De) |
Supporting Context for Ambiguous Translations (SCAT) |
Document-level MT |
none |
experts |
~14K |
1 |
20 |
English-French (En-Fr) |