👋 Hi, there! This is the project page for ACL 2022 (Findings) paper: “E-KAR: A Benchmark for Rationalizing Natural Language Analogical Reasoning”.
Published in ACL 2022 (Findings) as Long Paper.
English version, hosted at 🤗 Dataset
Chinese version, hosted at 🤗 Dataset
The ability to recognize analogies is fundamental to human cognition. Existing benchmarks to test word analogy do not reveal the underneath process of analogical reasoning of neural models.
Holding the belief that models capable of reasoning should be right for the right reasons, we propose a first-of-its-kind Explainable Knowledge-intensive Analogical Reasoning benchmark (E-KAR). Our benchmark consists of 1,655 (in Chinese) and 1,251 (in English) problems sourced from the Civil Service Exams, which require intensive background knowledge to solve. More importantly, we design a free-text explanation scheme to explain whether an analogy should be drawn, and manually annotate them for each and every question and candidate answer.
You can find the slides, poster and video about E-KAR at here.
Query:
- tea: teapot: teacup
Candidate Answers:
A) passenger: bus: taxi
B) magazine: bookshelf: reading room
C) talents: school : enterprise
D) textbooks: bookstore: printing factory
Answer: C
Explanation for Query:
- $E_Q$: Both “teapot” and “teacup” are containers for holding “tea”. After the “tea” is brewed in the “teapot”, it is transported into the “teacup”.
Explanation for Candidate Answers:
$E_A$: “Passengers” do not need to be transported into “taxi” after taking a “bus”. “Taxi” and “bus” are different ways of transportation.
$E_B$: The “bookshelf” is in the “reading room”.
$E_C$: Both “school” and “enterprise” are organizations. After “talents” are educated in “school”, they are transported into “enterprise”.
$E_D$: After “textbooks” are printed in the “printing factory”, they are sold in a “bookstore”. But the terms order is inconsistent with the query.
import datasets # 🤗
ekar_zh = datasets.load_dataset('Jiangjie/ekar_chinese')
ekar_en = datasets.load_dataset('Jiangjie/ekar_english')
There are altogether 8
task settings: 2 shared tasks
* 2 task modes
* 2 languages
.
Analogical QA
: The dataset can be used to train a model for analogical reasoning in the form of multiple-choice QA.Explanation Generation
: The dataset can be used to generate free-text explanations to rationalize analogical reasoning.EASY mode
: where query explanation ($E_Q$) can be used as part of the input.HARD mode
: no explanation is allowed as part of the input.Chinese version
: 1,655 problems and 8,275 sentences of explanations, sourced from Civil Service Exams of China with manually annotated explanations.
English version
: 1,251 problems and 6,255 sentences of explanations, translated from Chinese version with culture-specific samples removed or rewritten.
Please submit and evaluate your results on the test sets at the publicly available E-KAR leaderboard hosted at EvalAI.
Note that the leaderboard only evaluates Analogical QA and Rationalized Analogical QA tasks, in order to avoid using unreliable automatic metrics for evaluating text generation (i.e., explanations). See the E-KAR leaderboard for participation details.
If you find this work useful to your research, please kindly cite our paper:
@inproceedings{chen-etal-2022-e,
title = "{E}-{KAR}: A Benchmark for Rationalizing Natural Language Analogical Reasoning",
author = "Chen, Jiangjie and
Xu, Rui and
Fu, Ziquan and
Shi, Wei and
Li, Zhongqiao and
Zhang, Xinbo and
Sun, Changzhi and
Li, Lei and
Xiao, Yanghua and
Zhou, Hao",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2022",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.findings-acl.311",
pages = "3941--3955",
}