- 使用gem5 生成 simpoint
cmdline:
build/X86/gem5.fast configs/spec2017/se_spec17.py --spec-2017-bench
--spec-size=ref --warmup-insts=1000000 --maxinsts=100000000000
--arch=X86 --cpu-type=AtomicSimpleCPU --sys-clock=1.6GHz
--cpu-clock=3.5GHz --mem-size=16GB --mem-type=DDR4_2400_4x16
--simpoint-profile
--simpoint-interval 100000000 --l4-size=512MB -b mcf
生成的文件目录,其中simpoint.bb.gz生成的压缩文件,供simpoint使用。
./mcf/
├── benchmark_err
├── benchmark_out
├── config.ini
├── config.json
├── dumpDebugState.txt
├── simerr
├── simout
├── simpoint.bb.gz
└── stats.txt
- 使用simpoint处理simpoint.bb.gz 生成weights和simpoint
见simpoint weight 格式
此时我们得到了
./mcf/
├── benchmark_err
├── benchmark_out
├── config.ini
├── config.json
├── dumpDebugState.txt
├── simerr
├── simout
├── simpoint.bb.gz
├── simpoints
├── stats.txt
└── weights
- gem5输入simpoints和weights,在对应的节点产生checkpoint
我的实验warm up inst 4Billion比较长,可以根据自己需要修改,比如10K。
build/X86/gem5.fast
configs/spec2017/se_spec17.py --spec-2017-bench --spec-size=ref
--warmup-insts=1000000 --maxinsts=100000000000
--arch=X86 --cpu-type=X86TimingSimpleCPU
--sys-clock=1.6GHz --cpu-clock=3.5GHz --mem-size=16GB --mem-type=DDR4_2400_4x16
--caches --cache-level=4 --l4-size=512MB
-b mcf
--take-simpoint-checkpoint=mcf/simpoints, mcf/weights,100000000,4000000000
4)产生了checkpoint之后的目录如下图所示:
这里我换了一个program,因为正好在跑这个。
cactuBSSN/
├── benchmark_err
├── benchmark_out
├── config.ini
├── config.json
├── cpt.simpoint_00_inst_6600000000_weight_0.177177_interval_100000000_warmup_4000000000
│ ├── m5.cpt
│ └── system.physmem.store0.pmem
├── cpt.simpoint_01_inst_49400000000_weight_0.058058_interval_100000000_warmup_4000000000
│ ├── m5.cpt
│ └── system.physmem.store0.pmem
├── cpt.simpoint_02_inst_51400000000_weight_0.014014_interval_100000000_warmup_4000000000
│ ├── m5.cpt
│ └── system.physmem.store0.pmem
├── cpt.simpoint_03_inst_56700000000_weight_0.612613_interval_100000000_warmup_4000000000
│ ├── m5.cpt
│ └── system.physmem.store0.pmem
├── cpt.simpoint_04_inst_82500000000_weight_0.138138_interval_100000000_warmup_4000000000
│ ├── m5.cpt
│ └── system.physmem.store0.pmem
├── simerr
├── simout
└── stats.txt
m5.cpt即为当前cpu运行状态的快照,比如寄存器值,页表映射关系之类的:
system.physmem.store0.pmem则是当时的内存的状态。
因此有了cpt和pmem,gem5就可以从任意一个checkpoint restore,开始执行。
5)restore checkpoint
build/X86/gem5.fast configs/spec2017/se_spec17.py
--spec-2017-bench --spec-size=ref
--warmup-insts=1000000
--arch=X86 --cpu-type=X86O3CPU
--sys-clock=1.6GHz --cpu-clock=3.5GHz --caches --cache-level=4
--mem-size=16GB --mem-type=DDR4_2400_4x16
--l4-size=256MB
-b cactuBSSN
--restore-simpoint-checkpoint -r 4
--checkpoint-dir gem5-results-2017-20220902-checkpoint/512MB/cactuBSSN
这里的 -r 4 比较特殊,这里计数是按1开头的,因此-r 4实际上对应 第4)步中的第3个checkpoint:
cpt.simpoint_03_inst_56700000000_weight_0.612613_interval_100000000_warmup_4000000000
Resuming from cactuBSSN/cpt.simpoint_03_inst_56700000000_weight_0.612613_interval_100000000_warmup_4000000000
Resuming from SimPoint #3, start_inst:56700000000, weight:0.612613, interval:100000000, warmup:4000000000
Global frequency set at 1000000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
build/X86/mem/dram_interface.cc:692: warn: DRAM device capacity (4096 Mbytes) does not match the address range assigned (16384 Mbytes)
0: system.remote_gdb: listening for remote gdb on port 7000
build/X86/sim/process.cc:405: warn: Checkpoints for pipes, device drivers and sockets do not work.
Switch at curTick count:10000
build/X86/sim/simulate.cc:194: info: Entering event queue @ 187482902990720. Starting simulation...
Switched CPUS @ tick 187482903000720
switching cpus
system.cpu old->new system.switch_cpus
build/X86/sim/simulate.cc:194: info: Entering event queue @ 187482903000720. Starting simulation...
build/X86/sim/power_state.cc:106: warn: PowerState: Already in the requested power state, request ignored
build/X86/sim/simulate.cc:194: info: Entering event queue @ 187482903000822. Starting simulation...
O3CPU0 At 187487215508450 Tid[0] 10000000 instructions are executed.
O3CPU0 At 187491429812258 Tid[0] 20000000 instructions are executed.
O3CPU0 At 187495683554894 Tid[0] 30000000 instructions are executed.
使用restore checkpoint的方法就可以快速的从各个截点恢复执行,原理有点儿像分支预测里面的checkpoint和上下文切换的context。
版权声明:本文为hit_shaoqi原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。