Free-text explanations are free-form textual justifications that are not constrained to the instance inputs.

English

Textual tasks

Dataset Task Collection Method # Instances # Explanations per Instance Total # Annotators
Jansen et al. (2016) science exam QA authors 363 1 4
Ling et al. (2017) solving algebraic word problems automatic + crowd ~101K 1 n/a
Srivastava et al. (2017) detecting phishing emails crowd + authors 7 30-35 146
BabbleLabble relation extraction students + authors 200 1 10
e-SNLI natural language inference crowd ~569K 1 or 3 6325
LIAR-PLUS verifying claims from text automatic 12, 836 1 n/a
CoS-E v1.0 commonsense QA crowd 8,560 1 n/a
CoS-E v1.1 commonsense QA crowd 10,962 1 n/a
ECQA commonsense QA crowd 10,962 1 n/a
Sen-Making commonsense validation students + authors 2,021 1 7
ChangeMyView argument persuasiveness crowd 37,718 1 n/a
WinoWhy pronoun coreference resolution crowd 273 5 n/a
SBIC social bias inference crowd 48,923 1-3 n/a
PubHealth verifying claims from text automatic 11,832 1 n/a  
Wang et al. (2020) relation extraction crowd + authors 373 1 n/a
Wang et al. (2020) sentiment classification crowd + authors 85 1 n/a
e-delta-NLI defeasible natural language inference automatic 92,298 ~8 n/a
COPA-SSE (Semi-Structured Explanations for COPA)* Balanced COPA (commonsense QA, causal reasoning) crowd 1,500 4-9 (9747 total) N/A

* ConceptNet-like triples with free-form head and tail concepts. The author classed this as structured but says it’s not very rigid and can also be used as free text.

Multimodal tasks

Dataset Task Collection Method # Instances # Explanations per Instance Total # Annotators
BDD-X vehicle control for self-driving cars crowd ~26K 1 n/a
VQA-E visual QA automatic ~270K 1 n/a
VQA-X visual QA crowd 28,180 1 0r 3 n/a
ACT-X activity recognition crowd 18,030 3 n/a
Ehsan et al. (2019) playing arcade games crowd 2000 1 60
VCR visual commonsense reasning crowd ~290K 1 n/a
e-SNLI-VE visual-textual entailment crowd 11,335 3 n/a
ESPRIT reasoning about qualitative physics crowd 2441 2 n/a
VLEP future event prediction automatic + crowd 28,726 1 n/a
EMU reasoning about manipulated images crowd 48K n/a n/a

Multiple Languages

Dataset Task Collection Method # Instances # Explanations per Instance Total # Annotators
E-KAR analogical reasoning crowd 1,655 (in Chinese); 1,251 (in English) 5 N/A