《Graph-Based Visual Saliency》 翻译

  • Post author:
  • Post category:其他




摘要

A new bottom-up visual saliency model, Graph-Based Visual Saliency (GBVS), is proposed. It consists of two steps: rst forming activation maps on certain feature channels, and then normalizing them in a way which highlights conspicuity and admits combination with other maps. The model is simple, and biologically plausible insofar as it is naturally parallelized. This model powerfully predicts human xations on 749 variations of 108 natural images, achieving 98% of the ROC area of a human-based control, whereas the classical algorithms of Itti & Koch ([2], [3], [4]) achieve only 84%.

本文提出了一种新的自底向上的视觉显著性模型——基于图形的视觉显著性(GBVS)。它包括两个步骤:首先在特定的特征通道上形成激活图,然后以突出显著性并允许与其他图结合的方式对激活图进行归一化。这个模型很简单,而且在生物学上是合理的,因为它是自然并行的。该模型有效地预测了108幅自然图像的749种变化,实现了基于人为控制的ROC区域的98%,而Itti和Koch([2],[3],[4])的经典算法仅实现了84%。



1 Introduction

Most vertebrates, including humans, can move their eyes. They use this ability to sample in detail the most relevant features of a scene, while spending only limited processing resources elsewhere. The ability to predict, given an image (or video), where a human might fixate in a fixed-time freeviewing scenario has long been of interest in the vision community. Besides the purely scientic goal of understanding this remarkable behavior of humans, and animals in general, to consistently fixate on “important” information, there is tremendous engineering application, e.g. in compression and recognition [13]. The standard approaches (e.g., [2], [9]) are based on biologically motivated feature selection, followed by center-surround operations which highlight local gradients, and finally a combination step leading to a “master map”. Recently, Bruce [5] and others [4] have hypothesized that fundamental quantities such as “self-information” and “surprise” are at the heart of saliency/attention. However, ultimately, Bruce computes a function which is additive in feature maps, with the main contribution materializing as a method of operating on a feature map in such a way to get an activation, or saliency, map. Itti and Baldi dene “surprise” in general, but ultimately compute a saliency map in the classical [2] sense for each of a number of feature channels, then operate on these maps using another function aimed at highlighting local variation. By organizing the topology of these varied approaches, we can compare them more rigorously: i.e., not just endto-end, but also piecewise, removing some uncertainty about the origin of observed performance differences.

大多数脊椎动物,包括人类,都能移动眼睛。他们使用这种能力来详细采样场景中最相关的特征,而在其他地方只花费有限的处理资源。给定一幅图像(或视频),预测人类在可能会在固定时间内观看的场景的场景下,注视画面的什么位置,一直是视觉学界的兴趣所在。除了纯粹的科学目标,即了解人类和一般动物的这种非凡行为,以始终如一地专注于“重要”信息外,还有巨大的工程应用,例如:在压缩和识别[13]。标准方法(例如[2],[9])基于生物学激发的特征选择,然后进行突出局部梯度的中心环绕操作,最后是组合生成“主图”。最近,布鲁斯[5]和其他学者[4]假设诸如“自我信息”和“惊奇”之类的基本量是显着性/注意力的核心。但是,最终,布鲁斯计算出了一个在特征图中可加的函数,其中的主要贡献是作为一种在特征图中进行操作以获取激活或显着性图的方法而实现的。 Itti和Baldi dene通常会“感到惊讶”,但最终会针对多个特征通道中的每一个,以经典[2]的意义计算出显着性图,然后使用旨在突出局部变化的另一个功能在这些图上进行操作。通过组织这些不同方法的拓扑结构,我们可以更严格地比较它们:即不仅是端到端的,而且是分段的,从而消除了关于观察到的性能差异来源的一些不确定性。

Thus, the leading models of visual saliency may be organized into the these three stages:

  • (s1)

    extraction

    : extract feature vectors at locations over the image plane
  • (s2)

    activation

    : form an “activation map” (or maps) using the feature vectors
  • (s3)

    normalization/combination

    : normalize the activation map (or maps, followed by a combination of the maps into a single map)

因此,视觉显著性的主导模式可以分为以下三个阶段:

  • (s1)

    提取

    :提取图像平面上某个位置的特征向量
  • (s2)

    激活

    :利用特征向量形成一个“激活图”(或多个激活图)
  • (s3)

    规范化/组合

    :归一化激活图(或多图,然后将图的组合化为单个图)

In this light, [5] is a contribution to step (s2), whereas [4] is a contribution to step (s3). In the classic algorithms, step (s1) is done using biologically inspired filters, step (s2) is accomplished by subtracting feature maps at different scales (henceforth, “c-s” for “center” – “surround”), and step (s3) is accomplished in one of three ways: 1. a normalization scheme based on local maxima[2] ( “max-ave”), 2. an iterative scheme based on convolution with a difference-of-gaussians filter (“DoG”), and 3. a nonlinear interactions (“NL”) approach which divides local feature values by weighted averages of surrounding values in a way that is modelled to fit psychophysics data [11].

因此,[5]是对步骤(s2)的贡献,而[4]是对步骤(s3)的贡献。 在经典算法中,步骤(s1)使用受到生物学启发的卷积核完成,步骤(s2)通过减去不同比例的特征图(此后称为“中心”-“周围”的“ cs”)来完成,而步骤(s3) 通过以下三种方式之一完成:

  1. 基于局部最大值[2]的归一化方案(“ max-ave”)
  2. 基于与高斯差分滤波器(“ DoG”)卷积的迭代方案
  3. 非线性交互(NL)方法,该方法用周围值的加权平均值来划分局部特征值,以适应心理物理学的数据[11]。

We take a different approach, exploiting the computational power, topographical structure, and parallel nature of graph algorithms to achieve natural and efficient saliency computations. We define Markov chains over various image maps, and treat the equilibrium distribution over map locations as activation and saliency values. This idea is not completely new: Brockmann and Geisel [8] suggest that scanpaths might be predicted by properly defined Levy flights over saliency fields, and more recently Boccignone and Ferraro [7] do the same. Importantly, they assume that a saliency map is already available, and offer an alternative to the winner-takes-all approach of mapping this object to a set of fixation locations. In an unpublished pre-print, L.F. Costa [6] notes similar ideas, however offers only sketchy details on how to apply this to real images, and in fact includes no experiments involving fixations. Here, we take a unified approach to steps (s2) and (s3) of saliency computation, by using dissimilarity and saliency to define edge weights on graphs which are interpreted as Markov chains. Unlike previous authors, we do not attempt to connect features only to those which are somehow similar. We also directly compare our method to others, using power to predict human fixations as a performance metric.

我们采用不同的方法,利用计算能力、地形结构和图算法的并行性质来实现自然和高效的显著性计算。我们在各种由图片生成的(激活)图上定义马尔可夫链,并将这些图各个位置的稳态分布视为激活值和显著值。 这个想法并不完全是新的:Brockmann和Geisel认为扫描路径可以通过在显著域上正确定义

Levy flight

来预测,最近Boccignone和Ferraro也做了同样的事情。重要的是,他们假设显著图已经存在,并提供了一种替代“赢者通吃”的方法,即将目标映射到一系列注视位置(fixation locations)。在一篇未发表的预印论文中,L.F. Costa [6]也提出了类似的想法,但他只提供了粗略的细节,说明如何将这一方法应用于真实图像,实际上也没有涉及注视的实验。在本文中,我们对显著性计算步骤(s2)和(s3)采取统一的方法,通过使用不相似性和显著性来定义图上的边权,这些边权可解释为马尔可夫链。与以前的作者不同,我们不会尝试仅将要素连接到某种程度上相似的要素。我们还直接将我们的方法与其他方法进行比较,并使用幂值来预测人类注视,并作为性能指标。

The contributions of this paper are as follows:

  1. A complete bottom-up saliency model based on graph computations, GBVS, including a framework for “activation” and “normalization/combination”.
  2. A comparison of GBVS against existing benchmarks on a data set of grayscale images of natural environments (viz., foliage) with the eye-movement fixation data of seven human subjects, from a recent study by Einhäuser et. al. [1].

本文的贡献如下:

  1. 提出一个完整的基于图计算的自底向上显著性模型GBVS,它包括一个“激活”和“规范化/组合”的框架。
  2. 在自然环境(即,树叶)的灰度图像数据集,包括7名人类受试者的眼动注视数据上将GBVS与现有基准的比较。这个注视数据是 Einhäuser等[1]最近的研究得到的。



2. Graph-Based Saliency (GBVS)

Given an image



I

I






I





, we wish to ultimately highlight a handful of “signicant” locations where the image is “informative” according to some criterion, e.g. human fixation. As previously explained, this process is conditioned on first computing feature maps (s1), e.g. by linear filtering followed by some elementary nonlinearity [15], “Activation” (s2), “normalization and combination” (s3) steps follow as described below.

给定一个图像



I

I






I





,我们希望最终突出少数几个“重要”的位置。这些位置用某些标准(比如人类注视)来衡量,是“有信息的”。如前所述,这个过程的条件是首先计算特征图(s1)例如通过线性滤波(卷积操作)以及一些基本的非线性操作[15],“激活”(s2),“归一化和组合”(s3)步骤如下所述。


感觉这个(s1)就是神经网络中的 convolutional layer 和activation layer的操作。



2.1 Forming an Activation Map 形成激活图(s2)

Suppose we are given a feature map


1





M

:

[

n

]

2

R

M: [n]^2 \rightarrow \mathbb{R}






M




:








[


n



]










2





















R






. Our goal is to compute an activation map



A

:

[

n

]

2

R

A: [n]^2 \rightarrow \mathbb{R}






A




:








[


n



]










2





















R






, such that, intuitively, locations



(

i

,

j

)

[

n

]

2

(i,j) \in [n]^2






(


i


,




j


)













[


n



]










2












where



I

I






I





, or as a proxy,



M

(

i

,

j

)

M(i,j)






M


(


i


,




j


)





is somehow unusual in its neighborhood will correspond to high values of activation



A

A






A





.

假设我们得到一个特征图


2





M

:

[

n

]

2

R

M: [n]^2 \rightarrow \mathbb{R}






M




:








[


n



]










2





















R






。我们的目标是计算相应的激活图



A

:

[

n

]

2

R

A: [n]^2 \rightarrow \mathbb{R}






A




:








[


n



]










2





















R






,这样,直观地,在



(

i

,

j

)

[

n

]

2

(i,j) \in [n]^2






(


i


,




j


)













[


n



]










2












中,



I

I






I





,或者作为其代理的



M

(

i

,

j

)

M(i,j)






M


(


i


,




j


)





在与其周围比起来“不寻常的”的区域,将对应于激活图



A

A






A





的高值。



2.1.1 Existing Schemes 现有方案

Of course “unusual” does not constrain us sufficiently, and so one can choose several operating definitions. “Improbable” would lead one to the formulation of Bruce [5], where a histogram of



M

(

i

,

j

)

M(i,j)






M


(


i


,




j


)





values is computed in some region around



(

i

,

j

)

(i,j)






(


i


,




j


)





, subsequently normalized and treated as a probability distribution, so that



A

(

i

,

j

)

=

log

[

p

(

i

,

j

)

]

A(i,j) = \log\left[p(i, j)\right]






A


(


i


,




j


)




=








lo

g






[


p


(


i


,




j


)


]






is clearly defined with



p

(

i

,

j

)

=

P

r

{

M

(

i

,

j

)
  


  

n

e

i

g

h

b

o

r

h

o

o

d

}

p(i,j) = Pr\{M(i,j) \;|\;neighborhood\}






p


(


i


,




j


)




=








P


r


{



M


(


i


,




j


)









n


e


i


g


h


b


o


r


h


o


o


d


}





. Another approach compares local “center” distributions to broader “surround” distributions and calls the Kullback-Leibler tension between the two “surprise” [4]

当然,“不寻常”这个概念并不能充分地约束我们,所以我们可以选择几个操作定义。“未必会发生的”会引出Bruce的公式[5],在



(

i

,

j

)

(i,j)






(


i


,




j


)





附近的某个区域计算



M

(

i

,

j

)

M(i,j)






M


(


i


,




j


)





值的直方图,随后归一化并作为概率分布处理,使



A

(

i

,

j

)

=

log

[

p

(

i

,

j

)

]

A(i,j) = \log\left[p(i, j)\right]






A


(


i


,




j


)




=








lo

g






[


p


(


i


,




j


)


]






,其中



p

(

i

,

j

)

=

P

r

{

M

(

i

,

j

)
  


  

n

e

i

g

h

b

o

r

h

o

o

d
  

}

p(i,j) = Pr\{M(i,j) \;|\;neighborhood\;\}






p


(


i


,




j


)




=








P


r


{



M


(


i


,




j


)









n


e


i


g


h


b


o


r


h


o


o


d




}





。另一种方法将局部“中心”分布与更广泛的“周围”分布进行比较,并称这两个“惊讶”之间的Kullback-Leibler张力[4]。


这是什么鬼?看不明白。



2.1.2 A Markovian Approach 一种马尔科夫方法

We propose a more organic (see below) approach. Let us define the dissimilarity of



M

(

i

,

j

)

M(i,j)






M


(


i


,




j


)





and



M

(

p

,

q

)

M(p,q)






M


(


p


,




q


)





as

我们提出一种更自然的方法(见下文)。我们定义



M

(

i

,

j

)

M(i,j)






M


(


i


,




j


)









M

(

p

,

q

)

M(p,q)






M


(


p


,




q


)





的不相似性为:





d

(

(

i

,

j

)

(

p

,

q

)

)

log

[

M

(

i

,

j

)

M

(

p

,

q

)

]

(1)

d\left((i,j)\|(p,q)\right) \triangleq \left|\log\left[\frac{M(i,j)}{M(p,q)}\right]\right|\tag{1}






d





(


(


i


,




j


)





(


p


,




q


)


)






































































lo

g







[














M


(


p


,




q


)














M


(


i


,




j


)





















]

































































(



1



)







所谓的“不相似性”即“距离”。就是说,在从



I

I






I





提取得到的特征图



M

M






M





上,点



(

i

,

j

)

(i,j)






(


i


,




j


)





到点



(

p

,

q

)

(p,q)






(


p


,




q


)





定义了一个距离,用来衡量两点之间的相似程度。数学符号“



\triangleq












”表示 “定义为”

This is a natural definition of dissimilarity: simply the distance between one and the ratio of two quantities, measured on a logarithmic scale. For some of our experiments, we use



M

(

i

,

j

)

M

(

p

,

q

)

|M(i,j) -M(p,q)|









M


(


i


,




j


)













M


(


p


,




q


)








instead, and we have found that both work well. Consider now the fully-connected directed graph



G

A

G_A







G










A





















, obtained by connecting every node of the lattice



M

M






M





, labelled with two indices



(

i

,

j

)

[

n

]

2

(i,j) \in [n]^2






(


i


,




j


)













[


n



]










2












, with all other



n

1

n-1






n













1





nodes. The directed edge from node



(

i

,

j

)

(i,j)






(


i


,




j


)





to node



(

p

,

q

)

(p,q)






(


p


,




q


)





will be assigned a weight:

这是“不相似”的自然定义,简单地说就是两个量之比,用对数尺度测量。在我们的一些实验中,我们使用



M

(

i

,

j

)

M

(

p

,

q

)

|M(i,j) -M(p,q)|









M


(


i


,




j


)













M


(


p


,




q


)








代替,我们发现两者效果都很好。现在考虑一个全连接的有向图



G

A

G_A







G










A





















,它是由图



M

M






M





的每个顶点(用



(

i

,

j

)

[

n

]

2

(i,j) \in [n]^2






(


i


,




j


)













[


n



]










2












标出 )与所有其他



n

1

n-1






n













1





个顶点连接而得到的。将节点



(

i

,

j

)

(i,j)






(


i


,




j


)





到节点



(

p

,

q

)

(p,q)






(


p


,




q


)





的有向边赋权:





w

1

(

(

i

,

j

)

,

(

p

,

q

)

)

d

(

(

i

,

j

)

(

p

,

q

)

)

F

(

i

p

,

j

q

)

(2)

w_1\left((i,j),(p,q)\right)\triangleq d\left((i,j)\|(p,q)\right) \cdot F(i-p,j-q)\tag{2}







w










1





















(


(


i


,




j


)


,




(


p


,




q


)


)














d





(


(


i


,




j


)





(


p


,




q


)


)














F


(


i













p


,




j













q


)







(



2



)







其中,




F

(

a

,

b

)

=

exp

(

a

2

+

b

2

2

σ

2

)

(3)

F(a,b)=\exp\left(-\frac{a^2+b^2}{2\sigma^2}\right)\tag{3}






F


(


a


,




b


)




=








exp






(

















2



σ










2






















a










2











+





b










2




























)









(



3



)







为什么



(

i

,

j

)

(i,j)






(


i


,




j


)





的顶点是和其他



n

1

n-1






n













1





个顶点连接?



n

1

n-1






n













1





这个数感觉不对。




σ

\sigma






σ





is a free parameter of our algorithm


3


. Thus, the weight of the edge from node



(

i

,

j

)

(i,j)






(


i


,




j


)





to node



(

p

,

q

)

(p,q)






(


p


,




q


)





is proportional to their dissimilarity and to their closeness in the domain of



M

M






M





. Note that the edge in the opposite direction has exactly the same weight. We may now define a Markov chain on



G

A

G_A







G










A





















by normalizing the weights of the outbound edges of each node to 1, and drawing an equivalence between nodes & states, and edges weights & transition probabilities. The equilibrium distribution of this chain, reflecting the fraction of time a random walker would spend at each node/state if he were to walk forever, would naturally accumulate mass at nodes that have high dissimilarity with their surrounding nodes, since transitions into such subgraphs is likely, and unlikely if nodes have similar



M

M






M





values. The result is an activation measure which is derived from pairwise contrast.




σ

\sigma






σ





是本算法的自由参数

(超参)



3


。因此,节点



(

i

,

j

)

(i,j)






(


i


,




j


)





到节点



(

i

,

j

)

(i,j)






(


i


,




j


)





的边权值与它们的不相似性和它们在



M

M






M





域内的接近度成正比。注意,相反方向的边权值完全相同。我们现在可以通过将每个节点出站边的权值归一化来定义



G

A

G_A







G










A





















上的马尔可夫链,并画出节点和状态、边权值和转移概率之间的等价关系。 如果对这个图进行随机且无穷遍历,每个顶点/状态上分配到的时间分数就是该马尔可夫链的稳态分布。那些和其他顶点的



M

M






M





值差异大的顶点会因为自然积累访问量而形成一个子图。结果是从成对对比得出的激活度量。

We call this approach “organic” because, biologically, individual “nodes” (neurons) exist in a connected, retinotopically organized, network (the visual cortex), and communicate with each other (synaptic ring) in a way which gives rise to emergent behavior, including fast decisions about which areas of a scene require additional processing. Similarly, our approach exposes connected (via



F

F






F





) regions of dissimilarity (via



w

w






w





), in a way which can in principle be computed in a completely parallel fashion. Computations can be carried out independently at each node: in a synchronous environment, at each time step, each node simply sums incoming mass, then passes along measured partitions of this mass to its neighbors according to outbound edge weights. The same simple process happening at all nodes simultaneously gives rise to an equilibrium distribution of mass.

我们称这种方法为“自然的”,因为从生物学上讲,单个的“顶点”(神经元)存在于一个连接的、视网膜局部组织的网络(视觉皮层)中,并以一种产生紧急行为的方式相互沟通(突触环),包括快速决定场景的哪个区域需要额外处理。同样,我们的方法以一种原则上可以用完全并行的方式计算的方式,(通过函数



F

F






F





)暴露了不同的连通区域(用



w

w






w





表示)。计算可以在每个节点独立地进行:在一个同步环境中,在每个时间片段,每个节点简单地将传入的访问量相加,然后根据输出边缘权值将该问的测量分区传递给它的邻居

(这句有点别扭)

。同样的简单过程同时发生在所有节点上,就会产生访问量的稳态分布。


Technical Notes

The equilibrium distribution of this chain exists and is unique because the chain is ergodic, a property which emerges from the fact that our underlying graph



G

A

G_A







G










A





















is by construction strongly connected. In practice, the equilibrium distribution is computed using repeated multiplication of the Markov matrix with an initially uniform vector. The process yields the principal eigenvector of the matrix. The computational complexity is thus



O

(

n

4

K

)

O(n^4K)






O


(



n










4









K


)





where



K

n

2

K \ll n^2






K














n










2












is some small number of iterations required to meet equilibrium.


技术要点

:这条链存在且唯一存在一个稳态分布,因为这条链是各态历经的,这一特性来自于我们的底层图



G

A

G_A







G










A





















是通过构造为强连通图的这一事实。在实践中,稳态分布是利用马尔可夫矩阵与初始一致向量的重复乘法来计算的。这个过程产生了矩阵的主特征向量。计算复杂度为



O

(

n

4

K

)

O(n^4K)






O


(



n










4









K


)





,其中



K

n

2

K \ll n^2






K














n










2












是满足平衡所需的少量迭代


4




2.2 “Normalizing” an Activation Map “归一化” 一张激活图(s3)

The aim of the “normalization” step of the algorithm is much less clear than that of the activation step. It is, however, critical and a rich area of study. Earlier, three separate approaches were mentioned as existing benchmarks, and also the recent work of Itti on surprise [4] comes into the saliency computation at this stage of the process (although it can also be applied to s2 as mentioned above). We shall state the goal of this step as: concentrating mass on activation maps. If mass is not concentrated on individual activation maps prior to additive combination, then the resulting master map may be too nearly uniform and hence uninformative. Although this may seem trivial, it is on some level the very soul of any saliency algorithm: concentrating activation into a few key locations.

该算法的“归一化”步骤的目的远没有激活步骤明确。然而,这是一个关键的和丰富的研究领域。前面提到了三种不同的方法作为现有基准,Itti在“惊喜”上[4]最近的工作也进入了过程的这一阶段的显著性计算(尽管它也可以应用于s2,如上所述)。我们将说明这一步的目标是:集中质量在激活图上。如果在叠加组合之前,质量没有集中在单个激活图上,那么生成的主图可能过于均匀,因此信息不丰富。虽然这看起来微不足道,但它在某种程度上是任何显著算法的灵魂:将激活集中到几个关键位置。

Armed with the mass-concentration definition, we propose another Markovian algorithm as follows:

This time, we begin with an activation map A, which we wish to “normalize”. We construct a graph



G

N

G_N







G










N





















with



n

2

n^2







n










2












nodes labelled with indices from



[

n

]

2

[n]^2






[


n



]










2












. For each node



(

i

,

j

)

(i, j)






(


i


,




j


)





and every node



(

p

,

q

)

(p, q)






(


p


,




q


)





(including



(

i

,

j

)

(i,j)






(


i


,




j


)





) to which it is connected, we introduce an edge from



(

i

,

j

)

(i,j)






(


i


,




j


)





to



(

p

,

q

)

(p,q)






(


p


,




q


)





with weight:

根据质量-专注定义,我们提出另一种马尔可夫算法:

这一次,我们从激活图



A

:

[

n

]

2

R

A: [n]^2 \rightarrow \mathbb{R}






A




:








[


n



]










2





















R






开始


5


,我们希望将其“归一化”。我们构造一个由



n

2

n^2







n










2












个顶点构成的图



G

N

G_N







G










N





















,它的顶点用



[

n

]

2

[n]^2






[


n



]










2












来标记。对于它所连接的每个节点



(

i

,

j

)

(i,j)






(


i


,




j


)





和每个节点



(

p

,

q

)

(p,q)






(


p


,




q


)





(包括



(

i

,

j

)

(i,j)






(


i


,




j


)





),我们引入一条带权值的



(

i

,

j

)

(i,j)






(


i


,




j


)









(

p

,

q

)

(p,q)






(


p


,




q


)





的边:





w

2

(

(

i

,

j

)

,

(

p

,

q

)

)

A

(

p

,

q

)

F

(

i

p

,

j

q

)

(4)

w_2\left((i,j),(p,q)\right)\triangleq A(p,q) \cdot F(i-p,j-q)\tag{4}







w










2





















(


(


i


,




j


)


,




(


p


,




q


)


)














A


(


p


,




q


)













F


(


i













p


,




j













q


)







(



4



)






Again, normalizing the weights of the outbound edges of each node to unity and treating the resulting graph as a Markov chain gives us the opportunity to compute the equilibrium distribution over the nodes


6


. Mass will flow preferentially to those nodes with high activation. It is a mass concentration algorithm by construction, and also one which is parallelizable, as before, having the same natural advantages. Experimentally, it seems to behave very favorably compared to the standard approaches such as “DoG” and “NL”.

再次,将每个节点出站边的权值归一化,并将结果图视为马尔可夫链,使我们有机会计算


6


节点上的稳态分布。质量将优先流向那些激活度高的节点。它是一种通过构造的质量浓度算法,也是一种可并行的算法,具有和前面一样的自然优势。从实验上看,与“DoG”和“NL”等标准方法相比,它似乎表现得非常好。


  1. in the context of a mathematical formulation, let



    [

    n

    ]

    {

    1

    ,

    2

    ,

    .

    .

    .

    ,

    n

    }

    [n] \triangleq\{1,2,…,n\}






    [


    n


    ]













    {



    1


    ,




    2


    ,




    .


    .


    .


    ,




    n


    }





    . Also, the maps



    M

    M






    M





    , and later



    A

    A






    A





    , are presented as square



    (

    n

    ×

    n

    )

    (n \times n)






    (


    n




    ×








    n


    )





    only for expository simplicity. Nothing in this paper will depend critically on the square assumtion, and, in practice, rectangular maps are used instead.

    ↩︎

  2. 在数学公式的上下文中,设



    [

    n

    ]

    {

    1

    ,

    2

    n

    }

    [n] \triangleq\{1,2,…,n\}






    [


    n


    ]













    {



    1


    ,




    2















    n


    }





    。同样,对于特征图



    M

    M






    M





    ,以及后来的



    A

    A






    A





    ,仅仅为了简单起见,假设它为正方形



    (

    n

    ×

    n

    )

    (n \times n)






    (


    n




    ×








    n


    )





    矩阵。本文中没有任何内容严格依赖于方形假设,并且在实践中使用矩形特征图代替。

    ↩︎

  3. 在我们的实验中,



    σ

    \sigma






    σ





    设置为特征图宽度的



    1

    /

    10

    1

    /

    5

    1/10\sim 1/5






    1


    /


    1


    0













    1


    /


    5





    。结果对



    σ

    \sigma






    σ





    在这些值周围的扰动不是很敏感。

    ↩︎


    ↩︎

  4. 我们的实现没有对速度进行优化,在2.4 GHz Pentium CPU上收敛于大小为



    25

    ×

    37

    25 \times 37






    2


    5




    ×








    3


    7





    的单个map上需要的时间是几分之一秒这个量级。

    ↩︎

  5. 需要说明的是,如果



    A

    A






    A





    是2.1中描述的特征向量计算的结果,即将基于图的激活步骤与基于图的归一化步骤连接起来,我们将得到的算法称为GBVS。然而,



    A

    A






    A





    可以用其他技术计算。

    ↩︎

  6. 我们注意到GBS的这一归一化步骤可以迭代



    k

    k






    k





    次来提高性能。实际上,我们使用



    k

    {

    2

    ,

    3

    ,

    4

    }

    k \in \{2,3,4\}






    k













    {



    2


    ,




    3


    ,




    4


    }





    。在这种情况下,选取不同的



    k

    k






    k





    对性能没有显著影响。

    ↩︎


    ↩︎