highlights are subsets of the input elements (either words, subwords, or full sentences) that are compact and sufficient to explain a prediction.

English

Dataset Task Granurality Restriction Collection Method # Instances # Explanations per Instance Total # Annotators
MovieReviews sentiment classification none author 1800 1 1
MovieReviewsc sentiment classification none crowd 200 2 (1 given) 2
SST sentiment classifcation none crowd 11,855 3 (1 given) n/a
WikiQA open-domain QA sentence crowd + authors 1,473    
WikiAttack detecting personal attacks none students 1089 4 (1 given) 40
e-SNLI natural language inference none crowd ~569K 1 or 3 6325
MultiRC reading comprehension QA sentences crowd 5,825 1 n/a
FEVER verifying claims from text sentences crowd ~136K 1 50
HotpotQA reading comprehension QA sentences crowd 112,779 n/a n/a
Hanselowski et al. (2019) verifying claims from text sentences crowd 6,422 varies n/a
NaturalQuestions reading comprehension QA 1 paragraph crowd n/a 1 or 5 50
CoQA conversational QA none crowd ~127K 1 or 3 n/a
CoS-E v1.0 commonsense QA none crowd 8,560 1 n/a
CoS-E v1.1 comonsense QA none crowd 10,962 1 n/a
BoolQc reading comprehension QA none crowd 199 3 (1 given) 3
EvidenceInference v1.0 evidence inference none experts 10,137 1 3
EvidenceInference v1.0c evidence inference none experts 125 1 4
EvidenceInference v2.0 evidence inference none experts 2,503 1 6
SciFact verifying claims from text 1-3 sentences experts 995 1-3 13
Kutlu et al. (2020) webpage relevance ranking 2-3 sentences crowd 700 15 n/a
ECtHR alleged legal violation prediction paragraphs auto + expert ~11K 1 n/a
Hummingbird Style classification words crowd 500 1 622 (3 annotators/sentence)
HateXplain hate-speech classification phrases crowd 20148 3 253
ContractNLI natural language inference sentence or a list item within a sentence expert + crowd 607 1 3
Indian Legal Documents Corpus (ILDC) predicting the outcome of a legal case sentence experts 56 1 5

Chinese

Dataset Task Granurality Restriction Collection Method # Instances # Explanations per Instance Total # Annotators
GCRC reading comprehension sentence students 1725 questions 1 (and rarely more) 12

Multiple Languages

Dataset Task Granurality Restriction Collection Method # Instances # Explanations per Instance Total # Annotators Languages
X-QE Quality Estimation of Machine Translation words experts 4590 1 14 Estonian-English (Et-En), Romanian-English (Ro-En), German-Chinese (De-Zh), Russian-German (Ru-De)
Supporting Context for Ambiguous Translations (SCAT) Document-level MT none experts ~14K 1 20 English-French (En-Fr)