tf.clip_by_global_norm的理解

  • Post author:
  • Post category:其他


help(tf.clip_by_global_norm)

Help on function clip_by_global_norm in module tensorflow.python.ops.clip_ops:

clip_by_global_norm(t_list, clip_norm, use_norm=None, name=None)

Clips values of multiple tensors by the ratio of the sum of their norms.

Given a tuple or list of tensors `t_list`, and a clipping ratio `clip_norm`,

this operation returns a list of clipped tensors `list_clipped`

and the global norm (`global_norm`) of all tensors in `t_list`. Optionally,

if you’ve already computed the global norm for `t_list`, you can specify

the global norm with `use_norm`.

To perform the clipping, the values `t_list[i]` are set to:



t_list[i] * clip_norm / max(global_norm, clip_norm)

where:





global_norm = sqrt(sum([l2norm(t)**2 for t in t_list]))



If `clip_norm > global_norm` then the entries in `t_list` remain as they are,

otherwise they’re all shrunk by the global ratio.

Any of the entries of `t_list` that are of type `None` are ignored.

This is the correct way to perform gradient clipping (for example, see

[Pascanu et al., 2012](http://arxiv.org/abs/1211.5063)

([pdf](http://arxiv.org/pdf/1211.5063.pdf))).

However, it is slower than `clip_by_norm()` because all the parameters must be

ready before the clipping operation can be performed.

Args:

t_list: A tuple or list of mixed `Tensors`, `IndexedSlices`, or None.

clip_norm: A 0-D (scalar) `Tensor` > 0. The clipping ratio.

use_norm: A 0-D (scalar) `Tensor` of type `float` (optional). The global

norm to use. If not provided, `global_norm()` is used to compute the norm.

name: A name for the operation (optional).

Returns:

list_clipped: A list of `Tensors` of the same type as `list_t`.

global_norm: A 0-D (scalar) `Tensor` representing the global norm.

Raises:

TypeError: If `t_list` is not a sequence.

其实了解 该函数的核心就在如下几句话:

To perform the clipping, the values `t_list[i]` are set to:



t_list[i] * clip_norm / max(global_norm, clip_norm)

where:





global_norm = sqrt(sum([l2norm(t)**2 for t in t_list]))



If `clip_norm > global_norm` then the entries in `t_list` remain as they are,

otherwise they’re all shrunk by the global ratio.



其实就一个公式:

y =  x*clip_norm/max(


sqrt(sum([l2norm(i)**2 for t in t_list])),




clip_norm


)

用中文来描述:当梯度值的l2范式的l2范式小于等于指定的最大梯度值,返回原来的梯度值;如果大于指定的梯度值,就需要缩小;限定梯度值;防止梯度爆炸



版权声明:本文为ningyanggege原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。