tf.clip_by_global_norm的理解

Post author:xfxia
Post published:2023年8月24日
Post category:其他

help(tf.clip_by_global_norm)

Help on function clip_by_global_norm in module tensorflow.python.ops.clip_ops:

clip_by_global_norm(t_list, clip_norm, use_norm=None, name=None)

Clips values of multiple tensors by the ratio of the sum of their norms.

Given a tuple or list of tensors `t_list`, and a clipping ratio `clip_norm`,

this operation returns a list of clipped tensors `list_clipped`

and the global norm (`global_norm`) of all tensors in `t_list`. Optionally,

if you’ve already computed the global norm for `t_list`, you can specify

the global norm with `use_norm`.

To perform the clipping, the values `t_list[i]` are set to:

t_list[i] * clip_norm / max(global_norm, clip_norm)

where:

global_norm = sqrt(sum([l2norm(t)**2 for t in t_list]))

If `clip_norm > global_norm` then the entries in `t_list` remain as they are,

otherwise they’re all shrunk by the global ratio.

Any of the entries of `t_list` that are of type `None` are ignored.

This is the correct way to perform gradient clipping (for example, see

[Pascanu et al., 2012](http://arxiv.org/abs/1211.5063)

([pdf](http://arxiv.org/pdf/1211.5063.pdf))).

However, it is slower than `clip_by_norm()` because all the parameters must be

ready before the clipping operation can be performed.

Args:

t_list: A tuple or list of mixed `Tensors`, `IndexedSlices`, or None.

clip_norm: A 0-D (scalar) `Tensor` > 0. The clipping ratio.

use_norm: A 0-D (scalar) `Tensor` of type `float` (optional). The global

norm to use. If not provided, `global_norm()` is used to compute the norm.

name: A name for the operation (optional).

Returns:

list_clipped: A list of `Tensors` of the same type as `list_t`.

global_norm: A 0-D (scalar) `Tensor` representing the global norm.

Raises:

TypeError: If `t_list` is not a sequence.

其实了解该函数的核心就在如下几句话：

To perform the clipping, the values `t_list[i]` are set to:

t_list[i] * clip_norm / max(global_norm, clip_norm)

where:

global_norm = sqrt(sum([l2norm(t)**2 for t in t_list]))

If `clip_norm > global_norm` then the entries in `t_list` remain as they are,

otherwise they’re all shrunk by the global ratio.

其实就一个公式：

y = x*clip_norm/max(

sqrt(sum([l2norm(i)**2 for t in t_list])),

clip_norm

)

用中文来描述：当梯度值的l2范式的l2范式小于等于指定的最大梯度值，返回原来的梯度值；如果大于指定的梯度值，就需要缩小；限定梯度值；防止梯度爆炸

原文链接：https://blog.csdn.net/ningyanggege/article/details/91985910

你可能也喜欢