3D-LaneNet: End-to-End 3D Multiple Lane Detection

Post author:xfxia
Post published:2023年10月8日
Post category:其他

一.概述

时间：2018.11

作者：Noa Garnett

机构：General Motors Israel（通用汽车）

内容：基于前视图能够预测road coord 的3D lane，输出的是车道线3维曲线，即道路平面也考虑了。且不约束车道宽度或者已建图的环境。

3D lanenet：intra-network IPM + anchor-based lane representation。

数据集：

1）a synthetic 3D dataset(https://sites.google.com/view/danlevi/3dlanes))

2）a real 3D dataset：Using the Lidar and IMU we generate aggregated lidar top view images

3）tuSimple dataset

在这里插入图片描述

二.方法介绍

1.方法概述

通过前视图可以得到IPM，然后在IPM上使用anchor-based的方式：将问题归结为一个物体检测问题，其中每个车道线都是一个物体，并且其3D曲线模型的估计就像对象的边界框一样。

在这里插入图片描述

输入：前视image

输出：基于anchor方式输出，然后可以得到车道线3D曲线

2.网络结构

在这里插入图片描述

1）网络由两部分组成：Image-view pathway + Top-view pathway

（1）Image-view pathway

输入为前视image, 输出为相机pitch角度 θ 以及相机高度 H。（假设相机坐标系和路面坐标系没有roll 和 yaw(y_c和有ｙ_r在互为投影)偏移）因此可以得到相机外参，而相机内参是已知的，故可以用于IPM变换。

（2）Top-view pathway

输入为前视图某个特征层经过 Projective Transformation Layer 变换后的特征，之后的特征层叠加来自经过变换的前视图特征层(S_IPM)，最后输出车道线检测；

2）坐标系关系

在这里插入图片描述

3）The projective transformation layer

（Fig 4中蓝色部分）

（1）road projection prediction branch

作用：通过预测 camera height 和 pitch角，从而得到 T_c2r，用于camera coord 到 road coord 的转换。（T_c2r 决定单应性矩阵H_r2i和S_IPM）

（2）Lane prediction head

(anchor-based lane reprentation)

使用anchor来定义lane candidates，使用精致的几何特征来描述每一个anchor的3D车道线形状。输出的road coord是有camera height和pitch估计得到。

在这里插入图片描述

作者提出了一种 Anchor-Based 车道线检测方法，其实这和目标检测中的 Anchor-Based 还是不太一样，这里的 Anchor 指的是几条线。

设定 x 方向的 anchor 线段：，y 坐标上的预定义位置：。对于每个 anchor 线段，分类上以 Y_ref 为基准，输出三种类别(距离 Y_ref 最近的线的类型)，包括有：两种车道中心线 + 一种车道线，即 {c1,c2,d}；在回归中，每种类别都输出 2K 个 Offsets：，对应的第 i 个 anchor，在第 j 位置上的 3D 点表示为。综上网络输出 N×(3(2K+1)) 维的向量，最后经过 1D NMS 处理后，每个 anchor 上的 3D 点通过样条插值出 3D 线条。（网络输出是Nx(3x2K + 3×1，其中，3x2k中‘3’表示， 3×1中‘3’表示3类，几类输出几个，然后选择max prob）

注意：entire lanes are ignored if they do not cross Yref inside valid top-view image boundaries, and lane points are ignored if occluded by the terrain (i.e. beyond a hill top).

3.loss

在这里插入图片描述

4.Training and ground truth association

在这里插入图片描述

5.infer

The y-range of the top view representation is 80 meters and the x-range is 20 meters.

The IPM scale is different in x and y: in the first top-view feature map each pixel corresponds to 16cm laterally (x) and 38.4cm longitudinally (y).

The last top-view feature map is ×8 smaller and since there is an anchor per column the distance between anchors is 16 × 8 = 128cm. We set the K(= 6) vertical reference points to be y = {5, 20, 40, 60, 80, 100} and Y_ref = 20m.

6.评测

在这里插入图片描述

参考：https://leijiezhang001.github.io/paper-reading-3D-LaneNet-End-to-End-3D-Multiple-Lane-Detection/

原文链接：https://blog.csdn.net/xumi13/article/details/105443798