HotpotQA数据集探索

Dataset笔记

Posted by viewsetting on October 20, 2019

HotpotQA数据集备注

文件

TRAIN: hotpot_train_v1.1.json http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_train_v1.1.json

DEV: hotpot_dev_distractor_v1.json http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_dev_distractor_v1.json

PRED(sample): sample_dev_pred.json http://curtis.ml.cmu.edu/datasets/hotpot/sample_dev_pred.json

Evaluation Script: https://raw.githubusercontent.com/hotpotqa/hotpot/master/hotpot_evaluate_v1.py

json结构

总体结构如下

{
	_id : 5abd94525542992ac4f382d2,
	answer: 'YG Entertainment',
	question: '2014 S/S is the debut album of a South Korean boy group that was formed by who?' ,
	supporting_facts: [['2014 S/S', 0], ['Winner (band)', 0]] ,
	context: [['List of awards and nominations received by Shinee',[' The group was formed by S.M. Entertainment in 2008 and...', 'South Korean boy group Shinee have received several awards...',...],[...,[...]],...]
	type: hard ,
	level: bridge
}
  • _id 每个问题的唯一hash编号,str类型
  • answer 问题答案,str
  • question 问题, str
  • supporting_facts 支持段落, list 。形式:[ [‘para_name_1’ , sentence_index ], [‘para_name_2’ , sentence_index ] , … ] Nested Element: [str,int] 段落名,句子编号
  • context 上下文,list 。 形式: [ [ ‘$ParaName_1$’,[‘$sentence_1$’,’$sentence_2$’,…,’$sentence_t$’]],[‘$ParaName_2$’,[…]],….,[‘$ParaName_n$’,[…]] ]
  • type 难度类型, easy,medium,hard 三选一。str
  • level 问题种类 , bridge或comparison二选一。 str

数据集大小(单位:问题)

  • train: 90447
  • dev: 7405

Context长度分布

注:使用官方评测中的normalize_answer()正则后的文本

  min 10% 20% 30% 40% 50% 60% 70% 80% 90% max
TRAIN 24 524 604 662 714 764 816 874 947 1059 2511
DEV 39 533 613 673 725 772 826 884 954 1069 2337

一些特殊长度的等比分点:

  512 1024 1500 2000
TRAIN 8.874% 87.419% 99.355% 99.910%
DEV 8.062% 86.685% 99.399% 99.919%

特殊问题

标准答案应回答‘yes’或’no’的问题数量。

  • Train: 5483 / 90477 6.060%
  • Dev: 458 / 7405 6.185%

标准答案应回答‘noanswer’的问题数量。

  • Train: 0 / 90477 0%
  • Dev: 0 / 7405 0%

评价函数

EM Score

Exact Matching的个数,如果normalized后的答案与normalized后的gold答案一致,EM值加一。

F1 Score

num same 预测文本与gold文本的相同的单词数

precision

$precision = NumSame / prediction$

recalll

$recall = NumSame / gold$

Span F1

计算span的F1值,公式相对简单

  Ground Truth False
Predicted True TP FP
Not Predicted FP FN

Joint F1

将F1与SpanF1结合,即precision与recall分别两两相乘。然后再计算JointF1,就是将$JointPrecision$与$JointRecall$合并计算。