Effect of Intel's power management on web servers
Background
$ cpupower idle-info
CPUidle driver: intel_idle
CPUidle governor: menu
analyzing CPU 0:
Number of idle states: 9
Available idle states: POLL C1 C1E C3 C6 C7s C8 C9 C10
POLL:
Flags/Description: CPUIDLE CORE POLL IDLE
Latency: 0
Usage: 5845
Duration: 3482887
C1:
Flags/Description: MWAIT 0x00
Latency: 2
Usage: 155904
Duration: 40263250
C1E:
Flags/Description: MWAIT 0x01
Latency: 10
Usage: 505874
Duration: 163384145
C3:
Flags/Description: MWAIT 0x10
Latency: 70
Usage: 405130
Duration: 115592556
C6:
Flags/Description: MWAIT 0x20
Latency: 85
Usage: 2407019
Duration: 1503034108
C7s:
Flags/Description: MWAIT 0x33
Latency: 124
Usage: 657
Duration: 674710
C8:
Flags/Description: MWAIT 0x40
Latency: 200
Usage: 1694254
Duration: 2067037470
C9:
Flags/Description: MWAIT 0x50
Latency: 480
Usage: 38
Duration: 65699
C10:
Flags/Description: MWAIT 0x60
Latency: 890
Usage: 44032
Duration: 79069478
$ cpupower frequency-info
analyzing CPU 0:
driver: intel_pstate
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: Cannot determine or is not supported.
hardware limits: 400 MHz - 4.00 GHz
available cpufreq governors: performance powersave
current policy: frequency should be within 400 MHz and 4.00 GHz.
The governor "powersave" may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 1.60 GHz (asserted by call to kernel)
boost state support:
Supported: yes
Active: yes
$ sudo turbostat --quiet sleep 5 5.004098 sec Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ SMI C1 C1E C3 C6 C7s C8 C9 C10 C1% C1E% C3% C6% C7s% C8% C9% C10% CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp GFX%rc6 GFXMHz Totl%C0 Any%C0 GFX%C0 CPUGFX% Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10 PkgWatt CorWatt GFXWatt RAMWatt PKG_% RAM_% - - 6 0.35 1640 1991 2009 0 1 54 11 78 0 1555 0 523 0.00 0.01 0.00 0.18 0.00 27.94 0.00 71.50 0.95 0.00 0.30 98.39 54 55 99.96 300 2.76 1.83 0.00 0.00 20.55 0.12 0.09 0.15 75.31 0.00 0.00 0.76 0.09 0.00 0.32 0.00 0.00 0 0 12 0.77 1625 1991 730 0 0 2 5 48 0 603 0 90 0.00 0.00 0.01 0.98 0.00 47.68 0.00 50.57 1.25 0.00 0.90 97.07 54 55 99.96 300 2.76 1.83 0.00 0.00 20.56 0.12 0.09 0.15 75.31 0.00 0.00 0.76 0.09 0.00 0.32 0.00 0.00 0 4 4 0.25 1650 1991 154 0 0 0 0 4 0 125 0 70 0.00 0.00 0.00 0.07 0.00 22.48 0.00 77.17 1.77 1 1 2 0.15 1689 1991 95 0 0 8 0 3 0 66 0 32 0.00 0.01 0.00 0.05 0.00 9.76 0.00 90.01 0.86 0.00 0.08 98.91 54 1 5 6 0.35 1626 1991 222 0 0 0 0 4 0 129 0 115 0.00 0.00 0.00 0.06 0.00 18.13 0.00 81.44 0.67 2 2 5 0.31 1672 1991 187 0 0 1 0 6 0 163 0 53 0.00 0.00 0.00 0.10 0.00 39.79 0.00 59.78 1.02 0.01 0.13 98.53 54 2 6 8 0.50 1623 1991 299 0 0 1 6 5 0 246 0 61 0.00 0.00 0.01 0.08 0.00 46.01 0.00 53.39 0.84 3 3 4 0.27 1642 1991 180 0 0 5 0 5 0 139 0 43 0.00 0.01 0.00 0.07 0.00 29.59 0.00 70.05 0.58 0.00 0.10 99.06 54 3 7 3 0.19 1664 1992 142 0 1 37 0 3 0 84 0 59 0.00 0.05 0.00 0.05 0.00 10.11 0.00 89.58 0.66
Processor states while Hasura serves a GraphQL Request
Running 1m test @ http://localhost:8181/v1/graphql 1 threads and 10 connections Thread calibration: mean lat.: 2.489ms, rate sampling interval: 10ms Thread Stats Avg Stdev Max +/- Stdev Latency 2.49ms 521.70us 12.94ms 75.29% Req/Sec 105.63 102.66 363.00 37.27% Latency Distribution (HdrHistogram - Recorded Latency) 50.000% 2.52ms 75.000% 2.80ms 90.000% 3.03ms 99.000% 3.53ms 99.900% 5.90ms 99.990% 12.94ms 99.999% 12.94ms 100.000% 12.94ms
60.011521 sec Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ SMI C1 C1E C3 C6 C7s C8 C9 C10 C1% C1E% C3% C6% C7s% C8% C9% C10% CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp GFX%rc6 GFXMHz Totl%C0 Any%C0 GFX%C0 CPUGFX% Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10 PkgWatt CorWatt GFXWatt RAMWatt PKG_% RAM_% - - 50 3.00 1669 1992 70009 0 261 1971 1526 11017 34 71007 4 17698 0.01 0.04 0.06 1.62 0.02 49.45 0.00 45.85 5.20 0.07 2.63 89.10 39 40 100.00 300 24.38 20.60 0.00 0.00 25.92 1.42 0.42 0.11 47.07 0.00 0.00 1.15 0.46 0.00 0.43 0.00 0.00 0 0 50 3.08 1629 1992 8822 0 32 210 185 1225 8 9399 2 2005 0.00 0.03 0.05 1.39 0.03 51.97 0.00 43.49 5.23 0.07 2.56 89.07 39 40 100.00 300 24.38 20.60 0.00 0.00 25.92 1.42 0.42 0.11 47.07 0.00 0.00 1.15 0.46 0.00 0.43 0.00 0.00 0 4 50 3.07 1615 1992 9119 0 49 259 216 1523 5 9173 0 2120 0.00 0.04 0.06 1.81 0.03 48.68 0.00 46.35 5.24 1 1 60 3.29 1807 1992 8405 0 20 171 191 1201 0 8406 0 2056 0.01 0.02 0.06 1.35 0.00 49.63 0.00 45.67 4.86 0.07 2.42 89.36 39 1 5 45 2.75 1653 1992 8734 0 39 385 191 1384 8 8161 0 2498 0.00 0.07 0.05 1.64 0.02 44.75 0.00 50.75 5.40 2 2 52 3.15 1644 1992 9181 0 29 269 184 1436 3 9256 0 2008 0.00 0.07 0.06 1.70 0.02 50.97 0.00 44.07 5.19 0.07 2.49 89.10 39 2 6 53 2.99 1756 1992 7914 0 25 265 183 1138 2 8432 0 2248 0.01 0.03 0.05 1.34 0.01 49.80 0.00 45.80 5.35 3 3 45 2.80 1616 1992 8336 0 25 209 150 1401 6 8112 0 2663 0.00 0.05 0.05 1.66 0.02 44.79 0.00 50.68 5.20 0.07 3.06 88.88 39 3 7 46 2.85 1617 1992 9498 0 42 203 226 1709 2 10068 2 2100 0.01 0.03 0.06 2.12 0.00 55.00 0.02 39.95 5.14
Fine tuning processor states
"performance" p-state governor
$ sudo cpupower frequency-set -g performance
Running 1m test @ http://localhost:8181/v1/graphql 1 threads and 10 connections Thread calibration: mean lat.: 2.002ms, rate sampling interval: 10ms Thread Stats Avg Stdev Max +/- Stdev Latency 1.94ms 579.10us 9.26ms 68.44% Req/Sec 105.33 100.88 333.00 43.93% Latency Distribution (HdrHistogram - Recorded Latency) 50.000% 1.90ms 75.000% 2.34ms 90.000% 2.62ms 99.000% 3.21ms 99.900% 5.98ms 99.990% 9.27ms 99.999% 9.27ms 100.000% 9.27ms
60.006307 sec Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ SMI C1 C1E C3 C6 C7s C8 C9 C10 C1% C1E% C3% C6% C7s% C8% C9% C10% CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp GFX%rc6 GFXMHz Totl%C0 Any%C0 GFX%C0 CPUGFX% Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10 PkgWatt CorWatt GFXWatt RAMWatt PKG_% RAM_% - - 67 1.72 3930 1992 67347 0 227 1648 1301 9135 32 72585 2 17162 0.01 0.04 0.08 1.62 0.01 50.84 0.00 45.66 8.01 0.10 2.23 87.95 49 49 100.01 300 13.73 11.58 0.00 0.00 19.86 1.41 0.35 0.04 52.64 0.00 0.00 2.49 1.69 0.00 0.64 0.00 0.00 0 0 62 1.58 3923 1992 9298 0 34 167 175 1277 3 9688 0 1967 0.02 0.05 0.08 1.92 0.01 54.27 0.00 42.06 7.76 0.07 2.17 88.42 49 49 100.01 300 13.73 11.58 0.00 0.00 19.86 1.41 0.35 0.04 52.64 0.00 0.00 2.49 1.69 0.00 0.64 0.00 0.00 0 4 64 1.64 3924 1992 7662 0 36 196 144 917 2 8251 0 2133 0.00 0.02 0.06 1.19 0.01 47.09 0.00 49.97 7.69 1 1 71 1.81 3932 1992 8996 0 26 232 196 1253 4 9553 1 2035 0.00 0.05 0.09 1.73 0.01 53.37 0.00 42.92 8.42 0.08 2.25 87.44 46 1 5 68 1.74 3933 1992 8754 0 35 224 136 1132 5 9662 0 2293 0.02 0.04 0.06 1.59 0.01 53.98 0.00 42.54 8.48 2 2 70 1.78 3937 1992 7611 0 20 200 149 1063 3 8247 0 2260 0.00 0.02 0.08 1.53 0.00 47.71 0.00 48.85 7.35 0.08 1.78 89.01 47 2 6 60 1.54 3918 1992 7920 0 25 210 134 780 2 8559 1 1993 0.00 0.04 0.06 1.05 0.01 51.94 0.00 45.33 7.58 3 3 74 1.88 3937 1992 8414 0 25 259 164 1262 3 9352 0 2064 0.00 0.10 0.07 1.78 0.02 49.07 0.00 47.05 8.32 0.16 2.72 86.91 48 3 7 69 1.76 3930 1992 8692 0 26 160 203 1451 10 9273 0 2417 0.00 0.03 0.17 2.16 0.01 49.29 0.00 46.56 8.45
Prohibit idle states
$ sudo cpupower idle-set -D2
Running 1m test @ http://localhost:8181/v1/graphql 1 threads and 10 connections Thread calibration: mean lat.: 0.995ms, rate sampling interval: 10ms Thread Stats Avg Stdev Max +/- Stdev Latency 1.13ms 353.31us 8.95ms 79.16% Req/Sec 108.45 104.68 333.00 19.19% Latency Distribution (HdrHistogram - Recorded Latency) 50.000% 1.20ms 75.000% 1.32ms 90.000% 1.41ms 99.000% 1.79ms 99.900% 4.33ms 99.990% 8.96ms 99.999% 8.96ms 100.000% 8.96ms
60.005077 sec Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ SMI C1 C1E C3 C6 C7s C8 C9 C10 C1% C1E% C3% C6% C7s% C8% C9% C10% CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp GFX%rc6 GFXMHz Totl%C0 Any%C0 GFX%C0 CPUGFX% Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10 PkgWatt CorWatt GFXWatt RAMWatt PKG_% RAM_% - - 3685 99.76 3693 1992 68091 16 0 0 0 0 0 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.24 0.00 0.00 0.00 85 84 99.92 300 398.38 99.59 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 19.73 18.80 0.00 0.35 0.00 0.00 0 0 3685 99.76 3693 1992 9292 2 0 0 0 0 0 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.24 0.00 0.00 0.00 81 84 99.92 300 398.38 99.59 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 19.73 18.80 0.00 0.35 0.00 0.00 0 4 3685 99.76 3693 1992 7130 2 0 0 0 0 0 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.24 1 1 3685 99.76 3693 1992 9178 2 0 0 0 0 0 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.24 0.00 0.00 0.00 80 1 5 3685 99.76 3693 1992 9541 2 0 0 0 0 0 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.24 2 2 3685 99.76 3693 1992 7925 2 0 0 0 0 0 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.24 0.00 0.00 0.00 85 2 6 3685 99.76 3693 1992 8344 2 0 0 0 0 0 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.24 3 3 3685 99.76 3693 1992 9789 2 0 0 0 0 0 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.24 0.00 0.00 0.00 80 3 7 3685 99.76 3693 1992 6892 2 0 0 0 0 0 0 0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.24
Conclusion
- With wrk2's, measured latencies are [only] accurate to a +/- ~1 ms granularity, due to OS sleep time behavior.
- Disabling idle states/using the performance governor has the effect of also consuming more power. In particular disabling all idle states means that the CPU is running at peak frequency continuously. You might want to experiment with disabling some of the deeper C-states and leaving the shallower ones on.
Related reading