Free-text explanations are free-form textual justifications that are not constrained to the instance inputs.
English
Textual tasks
| Dataset |
Task |
Collection Method |
# Instances |
# Explanations per Instance |
Total # Annotators |
| Jansen et al. (2016) |
science exam QA |
authors |
363 |
1 |
4 |
| Ling et al. (2017) |
solving algebraic word problems |
automatic + crowd |
~101K |
1 |
n/a |
| Srivastava et al. (2017) |
detecting phishing emails |
crowd + authors |
7 |
30-35 |
146 |
| BabbleLabble |
relation extraction |
students + authors |
200 |
1 |
10 |
| e-SNLI |
natural language inference |
crowd |
~569K |
1 or 3 |
6325 |
| LIAR-PLUS |
verifying claims from text |
automatic |
12, 836 |
1 |
n/a |
| CoS-E v1.0 |
commonsense QA |
crowd |
8,560 |
1 |
n/a |
| CoS-E v1.1 |
commonsense QA |
crowd |
10,962 |
1 |
n/a |
| ECQA |
commonsense QA |
crowd |
10,962 |
1 |
n/a |
| Sen-Making |
commonsense validation |
students + authors |
2,021 |
1 |
7 |
| ChangeMyView |
argument persuasiveness |
crowd |
37,718 |
1 |
n/a |
| WinoWhy |
pronoun coreference resolution |
crowd |
273 |
5 |
n/a |
| SBIC |
social bias inference |
crowd |
48,923 |
1-3 |
n/a |
| PubHealth |
verifying claims from text |
automatic 11,832 |
1 |
n/a |
|
| Wang et al. (2020) |
relation extraction |
crowd + authors |
373 |
1 |
n/a |
| Wang et al. (2020) |
sentiment classification |
crowd + authors |
85 |
1 |
n/a |
| e-delta-NLI |
defeasible natural language inference |
automatic |
92,298 |
~8 |
n/a |
| COPA-SSE (Semi-Structured Explanations for COPA)* |
Balanced COPA (commonsense QA, causal reasoning) |
crowd |
1,500 |
4-9 (9747 total) |
N/A |
* ConceptNet-like triples with free-form head and tail concepts. The author classed this as structured but says it’s not very rigid and can also be used as free text.
Multimodal tasks
| Dataset |
Task |
Collection Method |
# Instances |
# Explanations per Instance |
Total # Annotators |
| BDD-X |
vehicle control for self-driving cars |
crowd |
~26K |
1 |
n/a |
| VQA-E |
visual QA |
automatic |
~270K |
1 |
n/a |
| VQA-X |
visual QA |
crowd |
28,180 |
1 0r 3 |
n/a |
| ACT-X |
activity recognition |
crowd |
18,030 |
3 |
n/a |
| Ehsan et al. (2019) |
playing arcade games |
crowd |
2000 |
1 |
60 |
| VCR |
visual commonsense reasning |
crowd |
~290K |
1 |
n/a |
| e-SNLI-VE |
visual-textual entailment |
crowd |
11,335 |
3 |
n/a |
| ESPRIT |
reasoning about qualitative physics |
crowd |
2441 |
2 |
n/a |
| VLEP |
future event prediction |
automatic + crowd |
28,726 |
1 |
n/a |
| EMU |
reasoning about manipulated images |
crowd |
48K |
n/a |
n/a |
Multiple Languages
| Dataset |
Task |
Collection Method |
# Instances |
# Explanations per Instance |
Total # Annotators |
| E-KAR |
analogical reasoning |
crowd |
1,655 (in Chinese); 1,251 (in English) |
5 |
N/A |