Highlights

…highlights are subsets of the input elements (either words, subwords, or full sentences) that are compact and sufficient to explain a prediction.

English

Dataset	Task	Granurality Restriction	Collection Method	# Instances	# Explanations per Instance	Total # Annotators
MovieReviews	sentiment classification	none	author	1800	1	1
MovieReviews_c	sentiment classification	none	crowd	200	2 (1 given)	2
SST	sentiment classifcation	none	crowd	11,855	3 (1 given)	n/a
WikiQA	open-domain QA	sentence	crowd + authors	1,473
WikiAttack	detecting personal attacks	none	students	1089	4 (1 given)	40
e-SNLI	natural language inference	none	crowd	~569K	1 or 3	6325
MultiRC	reading comprehension QA	sentences	crowd	5,825	1	n/a
FEVER	verifying claims from text	sentences	crowd	~136K	1	50
HotpotQA	reading comprehension QA	sentences	crowd	112,779	n/a	n/a
Hanselowski et al. (2019)	verifying claims from text	sentences	crowd	6,422	varies	n/a
NaturalQuestions	reading comprehension QA	1 paragraph	crowd	n/a	1 or 5	50
CoQA	conversational QA	none	crowd	~127K	1 or 3	n/a
CoS-E v1.0	commonsense QA	none	crowd	8,560	1	n/a
CoS-E v1.1	comonsense QA	none	crowd	10,962	1	n/a
BoolQ_c	reading comprehension QA	none	crowd	199	3 (1 given)	3
EvidenceInference v1.0	evidence inference	none	experts	10,137	1	3
EvidenceInference v1.0_c	evidence inference	none	experts	125	1	4
EvidenceInference v2.0	evidence inference	none	experts	2,503	1	6
SciFact	verifying claims from text	1-3 sentences	experts	995	1-3	13
Kutlu et al. (2020)	webpage relevance ranking	2-3 sentences	crowd	700	15	n/a
ECtHR	alleged legal violation prediction	paragraphs	auto + expert	~11K	1	n/a
Hummingbird	Style classification	words	crowd	500	1	622 (3 annotators/sentence)
HateXplain	hate-speech classification	phrases	crowd	20148	3	253
ContractNLI	natural language inference	sentence or a list item within a sentence	expert + crowd	607	1	3
Indian Legal Documents Corpus (ILDC)	predicting the outcome of a legal case	sentence	experts	56	1	5

Chinese

Dataset	Task	Granurality Restriction	Collection Method	# Instances	# Explanations per Instance	Total # Annotators
GCRC	reading comprehension	sentence	students	1725 questions	1 (and rarely more)	12

Multiple Languages

Dataset	Task	Granurality Restriction	Collection Method	# Instances	# Explanations per Instance	Total # Annotators	Languages
X-QE	Quality Estimation of Machine Translation	words	experts	4590	1	14	Estonian-English (Et-En), Romanian-English (Ro-En), German-Chinese (De-Zh), Russian-German (Ru-De)
Supporting Context for Ambiguous Translations (SCAT)	Document-level MT	none	experts	~14K	1	20	English-French (En-Fr)