全网最详细分析veth-pair数据包转发

  • Post author:
  • Post category:其他




veth-pair数据包转发



实验环境

# 操作系统:Debian11

+----------------------------------------------------------------+
|                                                                |
|       +------------------------------------------------+       |
|       |             Newwork Protocol Stack             |       |
|       +------------------------------------------------+       |
|              ↑               ↑               ↑               |
|..............|............... |............... |...............|
|              ↓               ↓               ↓               |
|        +----------+    +-----------+   +-----------+           |
|        |   eth0   |    |   veth0   |   |   veth1   |           |
|        +----------+    +-----------+   +-----------+           |
|192.168.0.10  ↑               ↑               ↑               |
|              |                +---------------+                |
|              |         10.70.2.10         10.70.2.11           |
+--------------|-------------------------------------------------+
               ↓
         Physical Network



创建veth-pair设备

root@debian:~# ip link add veth0 type veth peer name veth1
root@debian:~# ip link set veth0 up
root@debian:~# ip link set veth1 up



veth0添加IP

root@debian:~# ip addr add 10.70.2.10/24 dev veth0


路由表

root@debian:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.0.2    0.0.0.0         UG    100    0         0 ens33
10.70.2.0       0.0.0.0         255.255.255.0   U     0      0        0 veth0
192.168.0.0     0.0.0.0         255.255.255.0   U     100    0        0 ens33

root@debian:~# arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
114.114.114.114                  (incomplete)                              ens33
192.168.0.1              ether   00:50:56:c0:00:08   C                     ens33


发送ping包

[root@debian:~# ping 10.70.2.11

root@debian:~# tcpdump -n -i veth0
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
22:06:45.467154 ARP, Request who-has 10.70.2.11 tell 10.70.2.10, length 28
22:06:46.487460 ARP, Request who-has 10.70.2.11 tell 10.70.2.10, length 28
22:06:47.512417 ARP, Request who-has 10.70.2.11 tell 10.70.2.10, length 28

root@debian:~# tcpdump -n -i veth1
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
22:07:01.861339 ARP, Request who-has 10.70.2.11 tell 10.70.2.10, length 28
22:07:02.870627 ARP, Request who-has 10.70.2.11 tell 10.70.2.10, length 28
22:07:03.898166 ARP, Request who-has 10.70.2.11 tell 10.70.2.10, length 28
  1. 由于是

    第一次ping 10.70.2.11这个IP,此时arp表中没有相应记录,所以会发送一个ARP包

  2. 当ARP包通过socket到达协议栈后,查询目的地址和系统路由,知道去10.70.2.11的数据包需要从10.70.2.10出去

  3. 当ARP数据包到达veth0后,会直接将数据包丢给veth1,veth1接收到数据包后,会将数据包发送到协议栈,此时对比本机IP,发现没有10.70.2.11这个IP,故而会丢弃这个ARP包



veth1添加IP

root@debian:~# ip addr add 10.70.2.11/24 dev veth1


路由表

root@debian:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.0.2    0.0.0.0         UG    100    0        0 ens33
10.70.2.0       0.0.0.0         255.255.255.0   U     0      0        0 veth0
10.70.2.0       0.0.0.0         255.255.255.0   U     0      0        0 veth1
192.168.0.0     0.0.0.0         255.255.255.0   U     100    0        0 ens33

# ip neigh flush dev ens33  清空ens33的arp表
# ip neigh flush dev veth0  
# ip neigh flush dev veth1

​ 由于默认状态下veth不会响应ARP request包,需要做如下配置(

自己测的时候Centos7、Ubuntu20.04做了该配置不生效

)

echo 1 > /proc/sys/net/ipv4/conf/all/accept_local 
echo 1 > /proc/sys/net/ipv4/conf/default/accept_local 
echo 1 > /proc/sys/net/ipv4/conf/veth1/accept_local
echo 1 > /proc/sys/net/ipv4/conf/veth0/accept_local
echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/veth0/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/veth1/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/default/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/lo/rp_filter



第一次ping veth1


在没有ARP表时使用veth0 ping veth1

root@debian:~# ping -c 1 -I veth0 10.70.2.11
PING 10.70.2.11 (10.70.2.11) from 10.70.2.10 veth0: 56(84) bytes of data.
64 bytes from 10.70.2.11: icmp_seq=1 ttl=64 time=0.026 ms

--- 10.70.2.11 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.026/0.026/0.026/0.000 ms
^C
--- 192.168.2.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4094ms
rtt min/avg/max/mdev = 0.035/0.049/0.060/0.008 ms

root@debian:~# tcpdump -i veth0 -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
22:22:58.728550 ARP, Request who-has 10.70.2.11 tell 10.70.2.10, length 28
22:22:58.728560 ARP, Reply 10.70.2.11 is-at ee:38:2a:54:4f:dc, length 28
22:22:58.728561 IP 10.70.2.10 > 10.70.2.11: ICMP echo request, id 55778, seq 1, length 64


root@debian:~# tcpdump -i veth1 -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
22:22:58.728552 ARP, Request who-has 10.70.2.11 tell 10.70.2.10, length 28
22:22:58.728560 ARP, Reply 10.70.2.11 is-at ee:38:2a:54:4f:dc, length 28
22:22:58.728561 IP 10.70.2.10 > 10.70.2.11: ICMP echo request, id 55778, seq 1, length 64


root@debian:~# tcpdump -i lo -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
22:22:58.728567 IP 10.70.2.11 > 10.70.2.10: ICMP echo reply, id 55778, seq 1, length 64


从veth1和veth0都收到ARP request和reply,而lo没有收到关于ARP的任何数据包,可以分析得出:ARP响应包是不会被发送到lo口,即使目的地址为本机IP,而ICMP请求包如果目的地址是本机IP,该响应包是会被发送到lo口


分析

  1. ping -I veth0 10.70.2.11,因为刚开始没有10.70.2.11的MAC地址,所以会发送一个ARP广播包,这个广播包的目的地址为10.70.2.11,然后协议栈查询系统路由,发现目的地址为10.70.2.11的这个包应该丢给veth0
  2. veth0收到该包后,发送给veth1,veth1然后发送给协议栈
  3. 协议栈收到该包后,协议栈此时会构造一个(目的地址为10.70.2.10,源地址为10.70.2.11)的ARP响应包,协议栈查询10.70.2.10路由后,将该包从veth0发送出去,veth0将该包发送给veth1,veth1将该包发送给协议栈,然后获取到10.70.2.11的MAC地址,

    这也是为什么veth0,veth1均有一次request和reply的原因
  4. 当PING拿到10.70.2.11的MAC后,协议栈构造一个ICMP request数据包(源地址为10.70.2.10,目标地址为10.70.2.11),由于执行veth0了,所以该包发送到veth0,veth0将该包发送到协议栈
  5. 协议栈收到该包后,发现目标地址为10.70.2.11(本机地址),所以会构造一个ICMP reply(源地址为10.70.2.11,目的地址10.70.2.10)发送到lo口,lo口收到后反手发送给协议栈,协议栈发送给socket,socket发送给ping程序,然后ping命令回显成功


由于192.168.2.2是本机地址了,会导致数据包直接走回环口了,所以ping要指定哪个设备,具体分析:请参考上一篇ping localhost和本机区别



第二次ping veth1

root@debian:~# arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
114.114.114.114                  (incomplete)                              ens33
10.70.2.10               ether   22:b0:8c:94:2b:4f   C                     veth1
192.168.0.1              ether   00:50:56:c0:00:08   C                     ens33
10.70.2.11               ether   ee:38:2a:54:4f:dc   C                     veth0

root@debian:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.0.2    0.0.0.0         UG    100    0        0 ens33
10.70.2.0       0.0.0.0         255.255.255.0   U     0      0        0 veth0
10.70.2.0       0.0.0.0         255.255.255.0   U     0      0        0 veth1
192.168.0.0     0.0.0.0         255.255.255.0   U     100    0        0 ens33
root@debian:~# ping -I veth0 10.70.2.11
PING 10.70.2.11 (10.70.2.11) from 10.70.2.10 veth0: 56(84) bytes of data.
64 bytes from 10.70.2.11: icmp_seq=1 ttl=64 time=0.018 ms
64 bytes from 10.70.2.11: icmp_seq=2 ttl=64 time=0.060 ms
64 bytes from 10.70.2.11: icmp_seq=3 ttl=64 time=0.037 ms
^C
--- 10.70.2.11 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2035ms
rtt min/avg/max/mdev = 0.018/0.038/0.060/0.017 ms

root@debian:~# tcpdump -i veth0 -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
23:25:30.374267 IP 10.70.2.10 > 10.70.2.11: ICMP echo request, id 41019, seq 1, length 64
23:25:31.383364 IP 10.70.2.10 > 10.70.2.11: ICMP echo request, id 41019, seq 2, length 64
23:25:32.409151 IP 10.70.2.10 > 10.70.2.11: ICMP echo request, id 41019, seq 3, length 64

root@debian:~# tcpdump -i veth1 -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
23:25:30.374269 IP 10.70.2.10 > 10.70.2.11: ICMP echo request, id 41019, seq 1, length 64
23:25:31.383369 IP 10.70.2.10 > 10.70.2.11: ICMP echo request, id 41019, seq 2, length 64
23:25:32.409155 IP 10.70.2.10 > 10.70.2.11: ICMP echo request, id 41019, seq 3, length 64

root@debian:~# tcpdump -i lo -n 
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
23:25:30.374278 IP 10.70.2.11 > 10.70.2.10: ICMP echo reply, id 41019, seq 1, length 64
23:25:31.383381 IP 10.70.2.11 > 10.70.2.10: ICMP echo reply, id 41019, seq 2, length 64
23:25:32.409167 IP 10.70.2.11 > 10.70.2.10: ICMP echo reply, id 41019, seq 3, length 64


分析:

  1. ping -I veth0 10.70.2.11,目前知道该IP的MAC地址,故而会发送ICMP request到veth0,veth0收到这个数据包后,将其发送到veth1,veth1发送到协议栈
  2. 协议栈收到这个数据包后,发现本机设备有这个IP,立即构造一个ICMP replay数据包(源地址10.70.2.11,目标地址10.70.2.10),协议栈会将该包发送lo口,lo口发送到协议栈,协议栈发送到socket,socket发送给ping,然后ping回显,ping成功



veth ping公网 & 同网段其他IP

root@debian:~# ping -I veth0 1.1.1.1 
PING 1.1.1.1 (1.1.1.1) from 10.70.2.10 veth0: 56(84) bytes of data.
^C
--- 1.1.1.1 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 15356m

root@debian:~# tcpdump -i veth0 -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
23:36:13.342372 ARP, Request who-has 1.1.1.1 tell 10.70.2.10, length 28
23:36:14.361316 ARP, Request who-has 1.1.1.1 tell 10.70.2.10, length 28
23:36:15.383449 ARP, Request who-has 1.1.1.1 tell 10.70.2.10, length 28

root@debian:~# tcpdump -i veth1 -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
23:36:13.342375 ARP, Request who-has 1.1.1.1 tell 10.70.2.10, length 28
23:36:14.361322 ARP, Request who-has 1.1.1.1 tell 10.70.2.10, length 28
23:36:15.383455 ARP, Request who-has 1.1.1.1 tell 10.70.2.10, length 28

root@debian:~# tcpdump -i lo -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
23:36:16.410130 IP 10.70.2.10 > 10.70.2.10: ICMP host 1.1.1.1 unreachable, length 92
23:36:16.410133 IP 10.70.2.10 > 10.70.2.10: ICMP host 1.1.1.1 unreachable, length 92
23:36:16.410134 IP 10.70.2.10 > 10.70.2.10: ICMP host 1.1.1.1 unreachable, length 92


分析

  1. veth0->veth1->协议栈,协议栈发现1.1.1.1不是本机IP,将该包丢弃,然后构造一个ICMP响应包,响应包的目的地址为10.70.2.10,然后查询路由发现需要从veth0出去,所以构建一个源地址为10.70.2.10的响应包,然后由于目的地址为本机地址,所以会发送到lo口,然后lo口发送到ping程序,ping程序回显

  2. ping -I 有个bug,只有正常ping通的才能回显,ping不通的不能回显



版权声明:本文为qq_41586875原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。