Free-text explanations are free-form textual justifications that are not constrained to the instance inputs.
English
Textual tasks
Dataset |
Task |
Collection Method |
# Instances |
# Explanations per Instance |
Total # Annotators |
Jansen et al. (2016) |
science exam QA |
authors |
363 |
1 |
4 |
Ling et al. (2017) |
solving algebraic word problems |
automatic + crowd |
~101K |
1 |
n/a |
Srivastava et al. (2017) |
detecting phishing emails |
crowd + authors |
7 |
30-35 |
146 |
BabbleLabble |
relation extraction |
students + authors |
200 |
1 |
10 |
e-SNLI |
natural language inference |
crowd |
~569K |
1 or 3 |
6325 |
LIAR-PLUS |
verifying claims from text |
automatic |
12, 836 |
1 |
n/a |
CoS-E v1.0 |
commonsense QA |
crowd |
8,560 |
1 |
n/a |
CoS-E v1.1 |
commonsense QA |
crowd |
10,962 |
1 |
n/a |
ECQA |
commonsense QA |
crowd |
10,962 |
1 |
n/a |
Sen-Making |
commonsense validation |
students + authors |
2,021 |
1 |
7 |
ChangeMyView |
argument persuasiveness |
crowd |
37,718 |
1 |
n/a |
WinoWhy |
pronoun coreference resolution |
crowd |
273 |
5 |
n/a |
SBIC |
social bias inference |
crowd |
48,923 |
1-3 |
n/a |
PubHealth |
verifying claims from text |
automatic 11,832 |
1 |
n/a |
|
Wang et al. (2020) |
relation extraction |
crowd + authors |
373 |
1 |
n/a |
Wang et al. (2020) |
sentiment classification |
crowd + authors |
85 |
1 |
n/a |
e-delta-NLI |
defeasible natural language inference |
automatic |
92,298 |
~8 |
n/a |
COPA-SSE (Semi-Structured Explanations for COPA)* |
Balanced COPA (commonsense QA, causal reasoning) |
crowd |
1,500 |
4-9 (9747 total) |
N/A |
* ConceptNet-like triples with free-form head and tail concepts. The author classed this as structured but says it’s not very rigid and can also be used as free text.
Multimodal tasks
Dataset |
Task |
Collection Method |
# Instances |
# Explanations per Instance |
Total # Annotators |
BDD-X |
vehicle control for self-driving cars |
crowd |
~26K |
1 |
n/a |
VQA-E |
visual QA |
automatic |
~270K |
1 |
n/a |
VQA-X |
visual QA |
crowd |
28,180 |
1 0r 3 |
n/a |
ACT-X |
activity recognition |
crowd |
18,030 |
3 |
n/a |
Ehsan et al. (2019) |
playing arcade games |
crowd |
2000 |
1 |
60 |
VCR |
visual commonsense reasning |
crowd |
~290K |
1 |
n/a |
e-SNLI-VE |
visual-textual entailment |
crowd |
11,335 |
3 |
n/a |
ESPRIT |
reasoning about qualitative physics |
crowd |
2441 |
2 |
n/a |
VLEP |
future event prediction |
automatic + crowd |
28,726 |
1 |
n/a |
EMU |
reasoning about manipulated images |
crowd |
48K |
n/a |
n/a |
Multiple Languages
Dataset |
Task |
Collection Method |
# Instances |
# Explanations per Instance |
Total # Annotators |
E-KAR |
analogical reasoning |
crowd |
1,655 (in Chinese); 1,251 (in English) |
5 |
N/A |