auc是机器学习二分类问题的常用指标,其反映了分类器对正负样本的排序能力,换句话说,auc反映了模型对正负样本的区分能力。常用的auc计算方式有两种,一种是tensorflow的tf.metrics.auc函数,另一种是sklearn中的roc_auc_score()函数。二者都是通过极限逼近的思想,计算roc曲线下面积的一个个小梯形的面积和得到auc。不过有点奇怪的是,这两个函数计算出来的auc值并不总是相似的,有时会差别比较大,所以想分析下。本文先看下tf.metrics.auc的实现。

首先看下tf.metrics.auc的函数定义:

1
2
3
4
5
6
7
8
9
10
@tf_export('metrics.auc')
def auc(labels,
predictions,
weights=None,
num_thresholds=200,
metrics_collections=None,
updates_collections=None,
curve='ROC',
name=None,
summation_method='trapezoidal')

tf.metrics.auc的参数有三个比较重要。labels和predictions是必须参数,labels就是二分类问题的类别集合,一般就是0/1的集合,predictions就是模型的预测得分集合,是一个0-1的浮点数的集合。num_thresholds参数也很重要,这个参数控制了划分成多少个梯形计算面积。默认是200,可以根据batch大小进行调整。

再看下tf.metrics.auc的具体实现:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
 @tf_export('metrics.auc')
def auc(labels,
predictions,
weights=None,
num_thresholds=200,
metrics_collections=None,
updates_collections=None,
curve='ROC',
name=None,
summation_method='trapezoidal'):
if context.executing_eagerly():
raise RuntimeError('tf.metrics.auc is not supported when eager execution '
'is enabled.')

with variable_scope.variable_scope(name, 'auc',
(labels, predictions, weights)):
if curve != 'ROC' and curve != 'PR':
raise ValueError('curve must be either ROC or PR, %s unknown' % (curve))
kepsilon = 1e-7 # to account for floating point imprecisions
thresholds = [
(i + 1) * 1.0 / (num_thresholds - 1) for i in range(num_thresholds - 2)
] # thresholds是在[0-1]中分num_thresholds个段,而首尾加一个微量值避免0和1.
thresholds = [0.0 - kepsilon] + thresholds + [1.0 + kepsilon]

values, update_ops = _confusion_matrix_at_thresholds(
labels, predictions, thresholds, weights)

# Add epsilons to avoid dividing by 0.
epsilon = 1.0e-6

def interpolate_pr_auc(tp, fp, fn):
dtp = tp[:num_thresholds - 1] - tp[1:]
p = tp + fp
prec_slope = _safe_div(dtp, p[:num_thresholds - 1] - p[1:], 'prec_slope')
intercept = tp[1:] - math_ops.multiply(prec_slope, p[1:])
safe_p_ratio = array_ops.where(
math_ops.logical_and(p[:num_thresholds - 1] > 0, p[1:] > 0),
_safe_div(p[:num_thresholds - 1], p[1:], 'recall_relative_ratio'),
array_ops.ones_like(p[1:]))
return math_ops.reduce_sum(
_safe_div(
prec_slope * (dtp + intercept * math_ops.log(safe_p_ratio)),
tp[1:] + fn[1:],
name='pr_auc_increment'),
name='interpolate_pr_auc')

def compute_auc(tp, fn, tn, fp, name):
"""Computes the roc-auc or pr-auc based on confusion counts."""
if curve == 'PR':
if summation_method == 'trapezoidal':
logging.warning(
'Trapezoidal rule is known to produce incorrect PR-AUCs; '
'please switch to "careful_interpolation" instead.')
elif summation_method == 'careful_interpolation':
# This one is a bit tricky and is handled separately.
return interpolate_pr_auc(tp, fp, fn)
rec = math_ops.div(tp + epsilon, tp + fn + epsilon) # math_ops.div()除法
if curve == 'ROC': #auc是roc曲线的下面积
fp_rate = math_ops.div(fp, fp + tn + epsilon)
x = fp_rate
y = rec
else: # curve == 'PR'.
prec = math_ops.div(tp + epsilon, tp + fp + epsilon)
x = rec
y = prec
if summation_method in ('trapezoidal', 'careful_interpolation'): #默认参数是trapezoidal
# Note that the case ('PR', 'careful_interpolation') has been handled
# above.
return math_ops.reduce_sum(
math_ops.multiply(x[:num_thresholds - 1] - x[1:],
(y[:num_thresholds - 1] + y[1:]) / 2.),
name=name)
elif summation_method == 'minoring':
return math_ops.reduce_sum(
math_ops.multiply(x[:num_thresholds - 1] - x[1:],
math_ops.minimum(y[:num_thresholds - 1], y[1:])),
name=name)
elif summation_method == 'majoring':
return math_ops.reduce_sum(
math_ops.multiply(x[:num_thresholds - 1] - x[1:],
math_ops.maximum(y[:num_thresholds - 1], y[1:])),
name=name)
else:
raise ValueError('Invalid summation_method: %s' % summation_method)

# sum up the areas of all the trapeziums
auc_value = compute_auc(values['tp'], values['fn'], values['tn'],
values['fp'], 'value')
update_op = compute_auc(update_ops['tp'], update_ops['fn'],
update_ops['tn'], update_ops['fp'], 'update_op')

if metrics_collections:
ops.add_to_collections(metrics_collections, auc_value)

if updates_collections:
ops.add_to_collections(updates_collections, update_op)

return auc_value, update_op

tf.metrics.auc函数返回了两个值auc_value和update_op。auc_value就是我们需要的值,但是需要注意的是,只有必须先sess.run(update_op)之后才可以得到auc_value值。怎么理解呢?看下面代码就明白了。原因在后面介绍。

1
2
3
auc_value, update_op = tf.metrics.auc(labels=target, predictions=predictions_score, name="auc_metric", num_thresholds=512)
_ = sess.run(update_op)
auc = sess.run(auc_value)

看第87-90代码可知,auc_value和update_op是通过一个compute_auc()函数计算得到的,compute_auc()的传入参数貌似是混淆矩阵的四个值。

而在compute_auc函数中,57-61行中计算了真正例率rec和假正例率fp_rate,分别对应roc曲线的横坐标x和纵坐标y。66-72行计算了roc曲线下面积的小梯形之和。也就是说compute_auc()函数传入的是混淆矩阵的四个值,通过计算roc曲线下面积的小梯形之和得到auc返回

再回过来看下97-90行,auc_value和update_op不同在于values和update_ops变量,update_op貌似是个op,而auc_value是标量值。values, update_ops的计算在25行的_confusion_matrix_at_thresholds()函数,其传入的参数是分别是labels, predictions, thresholds, weights(默认值None)。labels, predictions不多说,而thresholds是在[0-1]中分num_thresholds个段,而首尾加一个微量值避免0和1。可以猜想下,是根据num_thresholds个段作为分类阈值计算混淆矩阵的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
def _confusion_matrix_at_thresholds(labels,
predictions,
thresholds,
weights=None,
includes=None):
all_includes = ('tp', 'fn', 'tn', 'fp')
if includes is None:
includes = all_includes
else:
for include in includes:
if include not in all_includes:
raise ValueError('Invalid key: %s.' % include)

with ops.control_dependencies([
check_ops.assert_greater_equal( #断言,检查输入
predictions,
math_ops.cast(0.0, dtype=predictions.dtype),
message='predictions must be in [0, 1]'),
check_ops.assert_less_equal(
predictions,
math_ops.cast(1.0, dtype=predictions.dtype),
message='predictions must be in [0, 1]')
]):
# 对predictions和labels进行类型规范
predictions, labels, weights = _remove_squeezable_dimensions(
predictions=math_ops.to_float(predictions),
labels=math_ops.cast(labels, dtype=dtypes.bool),
weights=weights)

num_thresholds = len(thresholds)

# Reshape predictions and labels.
predictions_2d = array_ops.reshape(predictions, [-1, 1])
labels_2d = array_ops.reshape(
math_ops.cast(labels, dtype=dtypes.bool), [1, -1])

# Use static shape if known.
num_predictions = predictions_2d.get_shape().as_list()[0]

# Otherwise use dynamic shape.
if num_predictions is None:
num_predictions = array_ops.shape(predictions_2d)[0]
thresh_tiled = array_ops.tile( # tile之后产生了num_thresholds*num_predictions的阈值矩阵
array_ops.expand_dims(array_ops.constant(thresholds), [1]), #expand_dims和reshape等价,但是palceloader无法使用reshape,这里产生了num_thresholds*1的阈值矩阵
array_ops.stack([1, num_predictions])) #stack之后产生了[1,num_predictions]

# Tile the predictions after thresholding them across different thresholds.
pred_is_pos = math_ops.greater(
array_ops.tile(array_ops.transpose(predictions_2d), [num_thresholds, 1]),
thresh_tiled)
if ('fn' in includes) or ('tn' in includes):
pred_is_neg = math_ops.logical_not(pred_is_pos)

# Tile labels by number of thresholds
label_is_pos = array_ops.tile(labels_2d, [num_thresholds, 1])
if ('fp' in includes) or ('tn' in includes):
label_is_neg = math_ops.logical_not(label_is_pos)

if weights is not None:
weights = weights_broadcast_ops.broadcast_weights(
math_ops.to_float(weights), predictions)
weights_tiled = array_ops.tile(
array_ops.reshape(weights, [1, -1]), [num_thresholds, 1])
thresh_tiled.get_shape().assert_is_compatible_with(
weights_tiled.get_shape())
else:
weights_tiled = None

values = {}
update_ops = {}

if 'tp' in includes:
true_p = metric_variable(
[num_thresholds], dtypes.float32, name='true_positives')
is_true_positive = math_ops.to_float(
math_ops.logical_and(label_is_pos, pred_is_pos))
if weights_tiled is not None:
is_true_positive *= weights_tiled
update_ops['tp'] = state_ops.assign_add(true_p,
math_ops.reduce_sum(
is_true_positive, 1)) #1是维度
values['tp'] = true_p

if 'fn' in includes:
false_n = metric_variable(
[num_thresholds], dtypes.float32, name='false_negatives')
is_false_negative = math_ops.to_float(
math_ops.logical_and(label_is_pos, pred_is_neg))
if weights_tiled is not None:
is_false_negative *= weights_tiled
update_ops['fn'] = state_ops.assign_add(false_n,
math_ops.reduce_sum(
is_false_negative, 1))
values['fn'] = false_n

if 'tn' in includes:
true_n = metric_variable(
[num_thresholds], dtypes.float32, name='true_negatives')
is_true_negative = math_ops.to_float(
math_ops.logical_and(label_is_neg, pred_is_neg))
if weights_tiled is not None:
is_true_negative *= weights_tiled
update_ops['tn'] = state_ops.assign_add(true_n,
math_ops.reduce_sum(
is_true_negative, 1))
values['tn'] = true_n

if 'fp' in includes:
false_p = metric_variable(
[num_thresholds], dtypes.float32, name='false_positives')
is_false_positive = math_ops.to_float(
math_ops.logical_and(label_is_neg, pred_is_pos))
if weights_tiled is not None:
is_false_positive *= weights_tiled
update_ops['fp'] = state_ops.assign_add(false_p,
math_ops.reduce_sum(
is_false_positive, 1))
values['fp'] = false_p

return values, update_ops

以values[‘tp’]为例看下values是怎么得到的(72-82行)。可以看到values['tp'] = true_p,而在73行true_p还没有赋值,仅仅是通过metric_variable()函数创建本地变量(Create variable in GraphKeys.(LOCAL|METRIC_VARIABLES) collections)。在第75行计算了真正例is_true_positive(真正例简称tp,即真实label是正例且预测也是正例),然后79行将tp和通过函数assign_add()加到了true_p上,然后返回op赋值给update_ops[‘tp’]。最后true_p赋值给了values[‘tp’]。**需要注意的是assign_add()返回的是op,所以如果想得到true_p的值必须先计算op(update_ops[‘tp’])**,这就解释了上文的问题。可以再看下assign_add()定义:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
@tf_export("assign_add")
def assign_add(ref, value, use_locking=None, name=None):
"""Update 'ref' by adding 'value' to it.

This operation outputs "ref" after the update is done.先更新才可以得到ref
This makes it easier to chain operations that need to use the reset value.

Args:
ref: A mutable `Tensor`. Must be one of the following types:
`float32`, `float64`, `int64`, `int32`, `uint8`, `uint16`, `int16`,
`int8`, `complex64`, `complex128`, `qint8`, `quint8`, `qint32`, `half`.
Should be from a `Variable` node.
value: A `Tensor`. Must have the same type as `ref`.
The value to be added to the variable.
use_locking: An optional `bool`. Defaults to `False`.
If True, the addition will be protected by a lock;
otherwise the behavior is undefined, but may exhibit less contention.
name: A name for the operation (optional).

Returns:
Same as "ref". Returned as a convenience for operations that want
to use the new value after the variable has been updated.
"""

然后看下怎么计算真正例is_true_positive的。math_ops.logical_and(label_is_pos, pred_is_pos)是对label_is_pos和pred_is_pos进行与操作。现在问题变成了分析label_is_pos和pred_is_pos是怎么来的。看58-59行。array_ops.tile(array_ops.transpose(predictions_2d), [num_thresholds, 1])得到的是num_thresholds*num_predictions的矩阵A,array_ops.transpose(predictions_2d)是1*n的矩阵,n对应batch大小为n的n个得分。使用tile沿着第0维复制num_thresholds次。thresh_tiled是num_thresholds*num_predictions的阈值矩阵。使用math_ops.greater()函数之后,如果predictions中元素大于阈值则返回True,否则False,最后得到了值为True/False的list,也就是pred_is_pos(预测为正),其它同理。所以在76行通过logical_and得到了is_true_positive真正例。

emm,大概就这样。