tensorboard是Google提出的一个机器学习可视化的工具,它的界面上有一个smoothing的参数,通过调整smoothing可以控制指标曲线的平滑程度。那么它背后的原理是什么呢?

tensorboard可以对机器学习任务中的各种指标进行可视化呈现,指标有loss,auc等,可视化方式有折线,直方图等。在很多情况下,如果样本比较少的情况下,各种指标的值就会变化很大,可视化出来之后就会变得很震荡。这时候就需要对曲线进行平滑,这时候smoothing参数就派上用场了。

简单来说,smoothing参数实现了一个指数平滑(Exponential smoothing)的逻辑,使用历史数据对当前值进行指数平滑。越久远的数据,对当前值的影响力越小,其影响力指数级衰减。指数平滑的公式如下:
$$
\begin{aligned}
& s_0=x_0 \\
& s_t=\alpha x_t + (1-\alpha)s_{t-1}, t>0 \\
& \alpha \ is \ smoothing\ factor,0<\alpha<1
\end{aligned}
$$
看到这里可能有所疑问,指数体现在哪里呢??

其实上面的只是一个状态转移方程,描述了t时刻和t-1时刻的关系。我们可以举个例子:

假设$s_0=0$,$\alpha=0.1$,那么

$s_1=x_1 + 0.9s_0$​​

$s_2=0.1x_2 + 0.9s_1$​​

$s_3=0.1x_3 + 0.9s_2$​

将$s_1$和$s_2$带入$s_3$​,则
$$
\begin{aligned}
s_3&=0.1x_3+0.9(0.1x_2 + 0.9(0.1x_1 + 0.9s_0)) \\
& =0.1(x_3+0.9x_2+0.9^2x_1) \\
& = \alpha (x_3+(1-\alpha)x_2 + (1-\alpha)^2x_1)
\end{aligned}
$$
也就是
$$
s_t=\alpha (x_t+(1-\alpha)x_{t-1} + (1-\alpha)^2x_{t-2} + … + (1-\alpha)^{t-1}x_1)
$$

可以看到,

  • 越靠近时刻t=1(越久远),$x$对$s$影响力以指数级下降。
  • $\alpha$越接近1,$s_t$越接近当前时刻的值$x_t$。

在tensorboard的smoothing实现中,上述逻辑正好反过来了,$\alpha$​变成了$(1-\alpha)$​,这点需要注意。实现源码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
private resmoothDataset(dataset: Plottable.Dataset) {
let data = dataset.data();
const smoothingWeight = this.smoothingWeight;
// 1st-order IIR low-pass filter to attenuate the higher-
// frequency components of the time-series.
let last = data.length > 0 ? 0 : NaN;
let numAccum = 0;
const yValues = data.map((d, i) => this.yValueAccessor(d, i, dataset));
// See #786.
const isConstant = yValues.every((v) => v == yValues[0]);
data.forEach((d, i) => {
const nextVal = yValues[i];
if (isConstant || !Number.isFinite(nextVal)) {
d.smoothed = nextVal;
} else {
last = last * smoothingWeight + (1 - smoothingWeight) * nextVal; //关键逻辑
numAccum++;
// The uncorrected moving average is biased towards the initial value.
// For example, if initialized with `0`, with smoothingWeight `s`, where
// every data point is `c`, after `t` steps the moving average is
//
// EMA = 0*s^(t) + c*(1 - s)*s^(t-1) + c*(1 - s)*s^(t-2) + ...
// = c*(1 - s^t)
//
// If initialized with `0`, dividing by (1 - s^t) is enough to debias
// the moving average. We count the number of finite data points and
// divide appropriately before storing the data.
let debiasWeight = 1;
if (smoothingWeight !== 1) {
debiasWeight = 1 - Math.pow(smoothingWeight, numAccum);
}
d.smoothed = last / debiasWeight; //没看懂debias的逻辑
}
});
}

参考: