Increasing Thermal Headroom

My laptop wasn’t performing as fast as I thought that it should have been. By simple inspection I was able to determine that when fully-loaded my processor was experiencing thermal throttling which artificially limited its performance in order to prevent overheating. Therefore I conducted a two-part experiment to see if I could alleviate the thermal issues and recover missing performance.

My experiments were successful and a reapplication of the thermal insulating material on the CPU and GPU enabled substantial improvements to the performance I was experiencing. I tried applications of both the Thermal Grizzy Kryonaut thermal paste and also the Thermal Grizzy Conductonaut liquid metal and concluded that the thermal paste application was the most appropriate way to increase the thermal headroom in my laptop.

Setup

Every day I use my Late 2013 Retina MacBook Pro for software development, a resource-intense endeavor on old hardware. However, the hardware isn’t that shabby: it comes with a quad-core Intel i7 CPU rated at 2.6 GHz, which jumps to 3.6 GHz in Turbo Boost mode when possible, 16 GB of RAM, a dedicated GPU for compute tasks, and a surprisingly fast 1 TB SSD (or at least it was surprisingly fast when I bought this machine five years ago).

Even for today those aren’t weak specifications. Newer hardware is better but most improvements have come in areas that don’t affect what I do too much. On that note my primary source of frustration is the app I work in at WordPress.com – wp-calypso. That app embodies most of the things people complain about when it comes to modern JavaScript applications; it’s built with npm + webpack + Babel + React + redux + lerna + tugboats + badgers + the rebel fleet. Various parts of this application consume an entire core of my CPU or even up to all of them.

But I felt like it was too slow and I found evidence supporting that claim.

It’s too hot

It can be hard to diagnose elusive claims like “my computer is too slow” and I wanted to make sure that before I jumped to conclusions that I would be able to test out and verify my theory. I suspected a thermal-throttling issue and correlated that to some chatter I read online about similar situations. The most obvious source of evidence was the fact that my CPU was hitting 99º C and staying there while building Calypso. The CPU is designed to operate below 100º C and if the internal temperature starts to exceed that then it will step down its frequency and slow down to cool off until the temperature drops.

These three graphs illustrate the throttling. We can see it in the way that the CPU frequency jumps around under its Turbo Boost frequency of 3.6 GHz. The places in the graphs where it jumps back up correspond to times in the build process where CPU utilization drops, largely due to serial processes that only use one of the four cores.

There’s something else noteworthy in these graphs – it’s that long tail on the cooling end. After CPU usage drops to near zero it takes around a minute for the temperatures to drop back down to the resting value around 50º C. What does that mean?

While these graphs demonstrate that I have a thermal problem they don’t indicate that I can have a thermal solution. However, that long-tail was a big clue that led me to believe I could resolve this.

It’s not cooling fast enough

When I started some further experiments to test this idea that the cooling wasn’t sufficient I started noticing an odd phenomenon: changing the laptop’s fan speeds wasn’t having a big impact on the temperature. It was my suspicion that if I manually forced the fans to operate at full speed then I would defer the throttling and thus speed-up my workflow. Since this didn’t happen I had to start hypothesizing about how that could be – how the impact of fan speed diminished and how the cooling took so long.

Maybe you figured this out already on your own but the problems sounds like there’s a thermal bottleneck preventing the heat from escaping. The CPU gets hot but it doesn’t generate an overwhelming amount of heat. It’s only when that heat gets trapped and starts building up that the temperature increases. I had a strong belief by now that something was in the way blocking the heat transfer and thus if I could eliminate that bottleneck I could restore performance.

Having read numerous claims online that Apple cuts corners on their thermal paste application I decided that replacing it was a viable solution. There remained the possibility dust had built-up and blocked the fans or radiator vanes on the heatsink but I had cleaned out my laptop a year previously and found little to no dust and saw no performance impact after cleaning it.

Trying high-end paste

Although I was very curious about trying a liquid-metal thermal material I wasn’t confident in the process and wanted to see if a high-end thermal paste would impact my performance. Therefore I ordered Thermal Grizzy’s Kryonaut paste, opened my laptop, cleaned it, then reapplied the paste.

Opening my laptop and cleaning it was relatively easy. I’ve done this twice now and it was dirtier this time than last time but still wasn’t too bad. Replacing the paste involves removing the heatsink at the top and exposing the CPU die.

Sadly I forgot to take a picture of these things after applying the new paste. Well, time to run some benchmarks again.

Kryonaut performance

Replacing the thermal paste was a success.

The graph on the right compares to the three graphs above.

We can see that the CPU is still hitting 99º but not throttling near as much. The fans now were playing a more-important role which you can see in the cooling drop-off once the CPU utilization ends. We’re back to a short tail and rapid temperature drop after load drops. This is how it should be.

My conclusion is that my hypothesis holds: the thermal paste was either improper in the first place or (more likely) just old and dried out and decayed.

Now all excited about my results I wanted to try one more thing – would liquid metal help more? I waited a couple weeks then took the plunge.

Trying liquid metal

I waited two weeks so I could renormalize myself to the new performance after replacing the thermal paste. It felt like I had a new computer and everything was faster.

Liquid metal is a kind of outlier though. Whereas the Kryonaut has a rated thermal conductivity of 12.5 (units not important) the liquid metal has a rating of 73. This means that (theoretically) the Conductonaut can transfer six times as much heat given the same circumstances the Kryonaut. The catch is that liquid metal is conductive thus inherently risky to apply to sensitive electronic equipment. Would this material give me even better results than the paste? Would it be worth the risk? Would I fry my computer in my attempt?

Well, my computer is old and I feel comfortable working with electronics so I took the risk.

Normally people recommend coating the surrounding area of the die with a kind of conformal coating or substitute as long as it’s non-conductive. For my system I wanted to try another idea. Since I had leftover Kryonaut I made “gutters” around the chip dies with the non-conductive thermal paste as a barrier should the liquid metal spread. I wasn’t expecting spread due to what I had about its surface tension, but just in case I wanted extra protection thus the gutters.

The liquid metal was surprisingly hard to work with. When I wanted to spread it with the included swab it wanted instead to stick to the swab. It took some getting used to and I think I probably made some mistakes in applying it. My results would later support this theory because it isn’t performing as well as I expected it to. My theory is that some places aren’t sufficiently coated (liquid metal doesn’t spread the way the paste does so you have to manually spread it before mounting the heatsink).

Liquid metal performance

The liquid metal experiment was a success but only marginally more so than the Kryonaut experiment. I doubt my results, that they are biased by a poor application, though they are still better than with the paste.

I’ve copied the previous graphs above for easier reference. With the Conductonaut the results exceed my expectations and my build time is now less than half the original time. We can see that here there’s no longer any thermal throttling at all because our temperatures never hit 99º C – they stay below or just over 80º C for the whole run.

We’re also seeing in these graphs an illustration of why I like to buy the highest-end processors with my computers: when allowed to run at full speed we accomplish our work in shorter amounts of time, the CPU more rapidly returns to lower-power states, and overall we reduce the strain on our battery while working faster.

In all of these benchmarks I have tried to be consistent by manually setting the fans to their full speed and by providing ample airflow through and around the laptop. Since the internal temperatures were highly-dependent on the thermal material between the CPU/GPU dies and the heatsink I’m concluding that indeed it was the thermal bottleneck and thus my initial theory held.

That I can now run processes with full CPU utilization and not hit the throttling temperature is incredible. I can’t actually do this on all cores simultaneously but few of my workflows produce that load on the system. Most of my frustrations and flows involve short bursts of raw compute demands on a single thread.

Impact of fan settings

There’s one final aspect here that I wanted to test. Now that the fans were the dominant factor in heat extraction I wondered how the automated fan settings impact performance. Before replacing the factory thermal paste I had had to give up on the automatic fan settings and replace them with Macs Fan Control, a utility I have loved using. After the replacements I was finally able to go back to the system defaults because they would keep my system cool enough without blasting the fans at full speed.

However, I was curious what would happen if I kept some custom settings. By default macOS lets the temperature go up pretty high – into the thermal-throttling range – before it ramps up the fan speed and cools things down.

This simple comparison I did was with the Conductonaut. I ran the Calypso unit test suite with the normal macOS fan settings and then with the fans manually set at 100%. I have taken steps to try and prevent filesystem caching from interfering with the results.

Interesting but not surprising. By anticipating the heat load for these bursts of system load we are able to speed up the processes by slowing the rate of temperature rise. In the case of automatic fan control we hit thermal throttling and that cascaded into taking twice the amount of time to run the tests as with the fans at full. We can estimate from the graph on the right that if our unit tests took much longer than this we also would have hit throttling at full fan speed because that’s where the graph is heading, but for my workflows this anticipatory behavior is great – it means that in real situations I can continue to cut out meaningful delays in my work.

Conclusion

The factory-installed thermal paste may or may not have been at the same quality as the rest of the system when it was installed, but definitely over time it became significantly worse than how my new applications performed. The difference in cost between a poor paste and a top-of-the-line paste was less than $10 – worth the upgrade.

Further, the computer’s built-in fan settings are somewhat optimized for noise. Were I to spend my time browsing the internet or writing I would be happy with these settings, but when developing software, processing imagery, or other compute-heavy tasks the defaults artificially limit my laptop’s performance. Installing a custom fan manager was a big step up in terms of cooling the system and thus also in opening up the performance.


Notes

  • The graphs come from the Intel Power Gadget and show one minute of readings. Because they are basic screenshots you can see overlap in measurements than span more than a minute.
  • The benchmarks here are an attempt to help me differentiate my bias from my experience. I want a change I make to have a positive impact which can lead me to believe there is one even when there isn’t. In this case the empirical data supported my intuitive assessments. There are flaws in my methodology but I have taken care to make reasonable experiments given my level of uncertainty in the changes.
Categories Uncategorized

1 thought on “Increasing Thermal Headroom

  1. This was a really interesting read, thanks for sharing!! 🍻

Leave a Reply

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close