veth-pair数据包转发
实验环境
# 操作系统:Debian11
+----------------------------------------------------------------+
| |
| +------------------------------------------------+ |
| | Newwork Protocol Stack | |
| +------------------------------------------------+ |
| ↑ ↑ ↑ |
|..............|............... |............... |...............|
| ↓ ↓ ↓ |
| +----------+ +-----------+ +-----------+ |
| | eth0 | | veth0 | | veth1 | |
| +----------+ +-----------+ +-----------+ |
|192.168.0.10 ↑ ↑ ↑ |
| | +---------------+ |
| | 10.70.2.10 10.70.2.11 |
+--------------|-------------------------------------------------+
↓
Physical Network
创建veth-pair设备
root@debian:~# ip link add veth0 type veth peer name veth1
root@debian:~# ip link set veth0 up
root@debian:~# ip link set veth1 up
veth0添加IP
root@debian:~# ip addr add 10.70.2.10/24 dev veth0
路由表
root@debian:~# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.0.2 0.0.0.0 UG 100 0 0 ens33
10.70.2.0 0.0.0.0 255.255.255.0 U 0 0 0 veth0
192.168.0.0 0.0.0.0 255.255.255.0 U 100 0 0 ens33
root@debian:~# arp -n
Address HWtype HWaddress Flags Mask Iface
114.114.114.114 (incomplete) ens33
192.168.0.1 ether 00:50:56:c0:00:08 C ens33
发送ping包
[root@debian:~# ping 10.70.2.11
root@debian:~# tcpdump -n -i veth0
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
22:06:45.467154 ARP, Request who-has 10.70.2.11 tell 10.70.2.10, length 28
22:06:46.487460 ARP, Request who-has 10.70.2.11 tell 10.70.2.10, length 28
22:06:47.512417 ARP, Request who-has 10.70.2.11 tell 10.70.2.10, length 28
root@debian:~# tcpdump -n -i veth1
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
22:07:01.861339 ARP, Request who-has 10.70.2.11 tell 10.70.2.10, length 28
22:07:02.870627 ARP, Request who-has 10.70.2.11 tell 10.70.2.10, length 28
22:07:03.898166 ARP, Request who-has 10.70.2.11 tell 10.70.2.10, length 28
-
由于是
第一次ping 10.70.2.11这个IP,此时arp表中没有相应记录,所以会发送一个ARP包
-
当ARP包通过socket到达协议栈后,查询目的地址和系统路由,知道去10.70.2.11的数据包需要从10.70.2.10出去
-
当ARP数据包到达veth0后,会直接将数据包丢给veth1,veth1接收到数据包后,会将数据包发送到协议栈,此时对比本机IP,发现没有10.70.2.11这个IP,故而会丢弃这个ARP包
veth1添加IP
root@debian:~# ip addr add 10.70.2.11/24 dev veth1
路由表
root@debian:~# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.0.2 0.0.0.0 UG 100 0 0 ens33
10.70.2.0 0.0.0.0 255.255.255.0 U 0 0 0 veth0
10.70.2.0 0.0.0.0 255.255.255.0 U 0 0 0 veth1
192.168.0.0 0.0.0.0 255.255.255.0 U 100 0 0 ens33
# ip neigh flush dev ens33 清空ens33的arp表
# ip neigh flush dev veth0
# ip neigh flush dev veth1
由于默认状态下veth不会响应ARP request包,需要做如下配置(
自己测的时候Centos7、Ubuntu20.04做了该配置不生效
)
echo 1 > /proc/sys/net/ipv4/conf/all/accept_local
echo 1 > /proc/sys/net/ipv4/conf/default/accept_local
echo 1 > /proc/sys/net/ipv4/conf/veth1/accept_local
echo 1 > /proc/sys/net/ipv4/conf/veth0/accept_local
echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/veth0/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/veth1/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/default/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/lo/rp_filter
第一次ping veth1
在没有ARP表时使用veth0 ping veth1
root@debian:~# ping -c 1 -I veth0 10.70.2.11
PING 10.70.2.11 (10.70.2.11) from 10.70.2.10 veth0: 56(84) bytes of data.
64 bytes from 10.70.2.11: icmp_seq=1 ttl=64 time=0.026 ms
--- 10.70.2.11 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.026/0.026/0.026/0.000 ms
^C
--- 192.168.2.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4094ms
rtt min/avg/max/mdev = 0.035/0.049/0.060/0.008 ms
root@debian:~# tcpdump -i veth0 -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
22:22:58.728550 ARP, Request who-has 10.70.2.11 tell 10.70.2.10, length 28
22:22:58.728560 ARP, Reply 10.70.2.11 is-at ee:38:2a:54:4f:dc, length 28
22:22:58.728561 IP 10.70.2.10 > 10.70.2.11: ICMP echo request, id 55778, seq 1, length 64
root@debian:~# tcpdump -i veth1 -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
22:22:58.728552 ARP, Request who-has 10.70.2.11 tell 10.70.2.10, length 28
22:22:58.728560 ARP, Reply 10.70.2.11 is-at ee:38:2a:54:4f:dc, length 28
22:22:58.728561 IP 10.70.2.10 > 10.70.2.11: ICMP echo request, id 55778, seq 1, length 64
root@debian:~# tcpdump -i lo -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
22:22:58.728567 IP 10.70.2.11 > 10.70.2.10: ICMP echo reply, id 55778, seq 1, length 64
从veth1和veth0都收到ARP request和reply,而lo没有收到关于ARP的任何数据包,可以分析得出:ARP响应包是不会被发送到lo口,即使目的地址为本机IP,而ICMP请求包如果目的地址是本机IP,该响应包是会被发送到lo口
分析
- ping -I veth0 10.70.2.11,因为刚开始没有10.70.2.11的MAC地址,所以会发送一个ARP广播包,这个广播包的目的地址为10.70.2.11,然后协议栈查询系统路由,发现目的地址为10.70.2.11的这个包应该丢给veth0
- veth0收到该包后,发送给veth1,veth1然后发送给协议栈
-
协议栈收到该包后,协议栈此时会构造一个(目的地址为10.70.2.10,源地址为10.70.2.11)的ARP响应包,协议栈查询10.70.2.10路由后,将该包从veth0发送出去,veth0将该包发送给veth1,veth1将该包发送给协议栈,然后获取到10.70.2.11的MAC地址,
这也是为什么veth0,veth1均有一次request和reply的原因
- 当PING拿到10.70.2.11的MAC后,协议栈构造一个ICMP request数据包(源地址为10.70.2.10,目标地址为10.70.2.11),由于执行veth0了,所以该包发送到veth0,veth0将该包发送到协议栈
- 协议栈收到该包后,发现目标地址为10.70.2.11(本机地址),所以会构造一个ICMP reply(源地址为10.70.2.11,目的地址10.70.2.10)发送到lo口,lo口收到后反手发送给协议栈,协议栈发送给socket,socket发送给ping程序,然后ping命令回显成功
由于192.168.2.2是本机地址了,会导致数据包直接走回环口了,所以ping要指定哪个设备,具体分析:请参考上一篇ping localhost和本机区别
第二次ping veth1
root@debian:~# arp -n
Address HWtype HWaddress Flags Mask Iface
114.114.114.114 (incomplete) ens33
10.70.2.10 ether 22:b0:8c:94:2b:4f C veth1
192.168.0.1 ether 00:50:56:c0:00:08 C ens33
10.70.2.11 ether ee:38:2a:54:4f:dc C veth0
root@debian:~# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.0.2 0.0.0.0 UG 100 0 0 ens33
10.70.2.0 0.0.0.0 255.255.255.0 U 0 0 0 veth0
10.70.2.0 0.0.0.0 255.255.255.0 U 0 0 0 veth1
192.168.0.0 0.0.0.0 255.255.255.0 U 100 0 0 ens33
root@debian:~# ping -I veth0 10.70.2.11
PING 10.70.2.11 (10.70.2.11) from 10.70.2.10 veth0: 56(84) bytes of data.
64 bytes from 10.70.2.11: icmp_seq=1 ttl=64 time=0.018 ms
64 bytes from 10.70.2.11: icmp_seq=2 ttl=64 time=0.060 ms
64 bytes from 10.70.2.11: icmp_seq=3 ttl=64 time=0.037 ms
^C
--- 10.70.2.11 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2035ms
rtt min/avg/max/mdev = 0.018/0.038/0.060/0.017 ms
root@debian:~# tcpdump -i veth0 -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
23:25:30.374267 IP 10.70.2.10 > 10.70.2.11: ICMP echo request, id 41019, seq 1, length 64
23:25:31.383364 IP 10.70.2.10 > 10.70.2.11: ICMP echo request, id 41019, seq 2, length 64
23:25:32.409151 IP 10.70.2.10 > 10.70.2.11: ICMP echo request, id 41019, seq 3, length 64
root@debian:~# tcpdump -i veth1 -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
23:25:30.374269 IP 10.70.2.10 > 10.70.2.11: ICMP echo request, id 41019, seq 1, length 64
23:25:31.383369 IP 10.70.2.10 > 10.70.2.11: ICMP echo request, id 41019, seq 2, length 64
23:25:32.409155 IP 10.70.2.10 > 10.70.2.11: ICMP echo request, id 41019, seq 3, length 64
root@debian:~# tcpdump -i lo -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
23:25:30.374278 IP 10.70.2.11 > 10.70.2.10: ICMP echo reply, id 41019, seq 1, length 64
23:25:31.383381 IP 10.70.2.11 > 10.70.2.10: ICMP echo reply, id 41019, seq 2, length 64
23:25:32.409167 IP 10.70.2.11 > 10.70.2.10: ICMP echo reply, id 41019, seq 3, length 64
分析:
- ping -I veth0 10.70.2.11,目前知道该IP的MAC地址,故而会发送ICMP request到veth0,veth0收到这个数据包后,将其发送到veth1,veth1发送到协议栈
- 协议栈收到这个数据包后,发现本机设备有这个IP,立即构造一个ICMP replay数据包(源地址10.70.2.11,目标地址10.70.2.10),协议栈会将该包发送lo口,lo口发送到协议栈,协议栈发送到socket,socket发送给ping,然后ping回显,ping成功
veth ping公网 & 同网段其他IP
root@debian:~# ping -I veth0 1.1.1.1
PING 1.1.1.1 (1.1.1.1) from 10.70.2.10 veth0: 56(84) bytes of data.
^C
--- 1.1.1.1 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 15356m
root@debian:~# tcpdump -i veth0 -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
23:36:13.342372 ARP, Request who-has 1.1.1.1 tell 10.70.2.10, length 28
23:36:14.361316 ARP, Request who-has 1.1.1.1 tell 10.70.2.10, length 28
23:36:15.383449 ARP, Request who-has 1.1.1.1 tell 10.70.2.10, length 28
root@debian:~# tcpdump -i veth1 -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
23:36:13.342375 ARP, Request who-has 1.1.1.1 tell 10.70.2.10, length 28
23:36:14.361322 ARP, Request who-has 1.1.1.1 tell 10.70.2.10, length 28
23:36:15.383455 ARP, Request who-has 1.1.1.1 tell 10.70.2.10, length 28
root@debian:~# tcpdump -i lo -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
23:36:16.410130 IP 10.70.2.10 > 10.70.2.10: ICMP host 1.1.1.1 unreachable, length 92
23:36:16.410133 IP 10.70.2.10 > 10.70.2.10: ICMP host 1.1.1.1 unreachable, length 92
23:36:16.410134 IP 10.70.2.10 > 10.70.2.10: ICMP host 1.1.1.1 unreachable, length 92
分析
- veth0->veth1->协议栈,协议栈发现1.1.1.1不是本机IP,将该包丢弃,然后构造一个ICMP响应包,响应包的目的地址为10.70.2.10,然后查询路由发现需要从veth0出去,所以构建一个源地址为10.70.2.10的响应包,然后由于目的地址为本机地址,所以会发送到lo口,然后lo口发送到ping程序,ping程序回显
-
ping -I 有个bug,只有正常ping通的才能回显,ping不通的不能回显