highlights are subsets of the input elements (either words, subwords, or full sentences) that are compact and sufficient to explain a prediction.

English

Dataset Task Granurality Restriction Collection Method # Instances # Explanations per Instance Total # Annotators
MovieReviews sentiment classification none author 1800 1 1
MovieReviewsc sentiment classification none crowd 200 2 (1 given) 2
SST sentiment classifcation none crowd 11,855 3 (1 given) n/a
WikiQA open-domain QA sentence crowd + authors 1,473    
WikiAttack detecting personal attacks none students 1089 4 (1 given) 40
e-SNLI natural language inference none crowd ~569K 1 or 3 6325
MultiRC reading comprehension QA sentences crowd 5,825 1 n/a
FEVER verifying claims from text sentences crowd ~136K 1 50
HotpotQA reading comprehension QA sentences crowd 112,779 n/a n/a
Hanselowski et al. (2019) verifying claims from text sentences crowd 6,422 varies n/a
NaturalQuestions reading comprehension QA 1 paragraph crowd n/a 1 or 5 50
CoQA conversational QA none crowd ~127K 1 or 3 n/a
CoS-E v1.0 commonsense QA none crowd 8,560 1 n/a
CoS-E v1.1 comonsense QA none crowd 10,962 1 n/a
BoolQc reading comprehension QA none crowd 199 3 (1 given) 3
EvidenceInference v1.0 evidence inference none experts 10,137 1 3
EvidenceInference v1.0c evidence inference none experts 125 1 4
EvidenceInference v2.0 evidence inference none experts 2,503 1 6
SciFact verifying claims from text 1-3 sentences experts 995 1-3 13
Kutlu et al. (2020) webpage relevance ranking 2-3 sentences crowd 700 15 n/a