记一次悲催的tf报错

昨天下午写tf代码的时候遇到一个报错tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0] = 24 is not in [0, 24)，怀疑是矩阵变换哪里出了问题，但是check了很久的逻辑没有发现任何问题，甚至将相关逻辑抽取出来做了单元测试，依然没有任何发现。今天决定check下数据是否有问题，虽然之前已经check没有问题，但由于没有头绪，还是决定再次check下数据。打印了找了一个part的数据（tfrecord格式）重新打印成明文，根据batch数据的值找到对应的源数据，发现已经到达数据的末尾，突然意识到了什么…貌似是最后的数据量 < batch_size导致了报错，竟然没有想到。。。

具体是为什么报错呢？之前的代码如下：

mask = tf.compat.v1.sparse_to_dense(sparse_indices=mask_line_id,
    output_shape=[self.batch_size, ],
    default_value=1,
    sparse_values=0,
)

这里生成了mask矩阵，用于在下游对input和label进行mask，mask的方式是使用tf.gather()函数。由于mask的大小是batch_size，而input和label的大小<batch_size，这时候使用tf.gather(input, mask)则报错tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0] = 24 is not in [0, 24)。

针对上述问题，一个简单的办法就是，动态获取input的大小，代码如下：

mask = tf.compat.v1.sparse_to_dense(sparse_indices=mask_line_id,
    output_shape=[tf.shape(indices)[0], ],
    default_value=1,
    sparse_values=0,
)

期间还遇到一个报错InvalidArgumentError: indices[0] = [0] is out of bounds: need 0 <= index < [0]，这是因为tf.compat.v1.sparse_to_dense的前两个参数的size不同。

↑
谢谢~ 您的支持将鼓励我继续创作！