September 19, 2024
Excessive voltage defects in microcode, fix coming in August

Excessive voltage defects in microcode, fix coming in August

What began last year as a handful of reports of instability in Intel’s Raptor Lake desktop chips has morphed into a much larger saga over the past few months. Faced with its biggest client chip instability issue in decades, Intel has been under increasing pressure to determine the root cause of the problem and fix it, as allegations of damaged chips have piled up and rumors have swirled amid Intel’s silence. But finally, it appears that Intel’s latest saga is nearing its end, as today the company announced that it has found the cause of the problem and will be rolling out a microcode patch next month to fix it.

Officially, Intel has been working since at least February of this year, if not earlier, to identify the cause of Raptor Lake’s instability issues. In the meantime, they’ve discovered a few correlative factors – asking motherboard vendors to stop using ridiculous power settings for their out-of-the-box configurations and finding a voltage-related bug in Enhanced Thermal Velocity Boost (eTVB) – but none of these factors have been the smoking gun that triggered all of this. All of which has forced Intel to continue searching for the root cause in private and filling in the gaps in public with a lot of awkward silence.

But it seems that Intel’s investigation has finally come to an end, even though Intel has yet to reveal definitive proof of the problem. According to a new update posted on the company’s community site, Intel has finally determined the cause of the problem and has developed a solution.

According to the company’s announcement, Intel has identified the cause of the instability issue as “high operating voltages,” which actually stem from a faulty algorithm in Intel’s microcode that was requesting an incorrect voltage. As a result, Intel will be able to fix the issue with a new microcode update, which is pending validation and is expected to be released in mid-August.

Based on extensive analysis of 13th/14th Gen Intel Core desktop processors returned to us due to instability issues, we have determined that high operating voltage is causing instability issues in some 13th/14th Gen desktop processors. Our analysis of the returned processors confirms that the high operating voltage is caused by a microcode algorithm that is causing incorrect voltage requests to the processor.

Intel is providing a microcode fix that addresses the root cause of high voltage exposure. We are continuing validation to ensure that the instability scenarios reported to Intel regarding its 13th/14th Gen Core desktop processors are addressed. Intel is currently targeting mid-August for release of the fix to partners after full validation.

Intel is committed to addressing this issue for our customers and we continue to ask that any customers currently experiencing instability issues on their 13th/14th Gen Intel Core desktop processors contact Intel Customer Support for assistance.
– Intel Community Post

And while Raptor Lake’s instability issues and the need to fix them aren’t good for Intel, the fact that the problem can be traced back to (or at least fixed by) microcode is about the best possible outcome the company can hope for. Of the entire spectrum of potential causes, microcode is the easiest to fix on a mass scale — microcode updates are already distributed via operating system updates, and all chips in a given stage (millions in total) are running the same microcode. Even a motherboard BIOS issue would be much harder to fix given the sheer number of different boards, let alone a real hardware defect that would require Intel to replace even more chips than it already has.

It would be remiss of us not to point out, however, that microcode is regularly used to mask issues deeper in the processor, as we saw with the Meltdown/Spectre patches several years ago. So while Intel publicly attributes the issue to microcode bugs, there are several other layers of the onion that is modern processors that could play a role. In this regard, a microcode patch offers the least amount of information about the bug and the performance implications of fixing it, since microcode can be used to mitigate many different issues.

But for now, Intel is focused on communicating the solution and establishing a distribution schedule. This case has certainly caused a lot of consternation at Intel over the past year, and it will continue to do so for at least another month.

In the meantime, we’ve reached out to our Intel contacts to see if the company will release additional details about the voltage bug and its fix. “High operating voltages” isn’t a very satisfying answer on its own, and given the unprecedented nature of the issue, we hope Intel will be able to share additional details about what’s happening and how Intel will prevent it in the future.

Intel also confirms that a via oxidation manufacturing issue affected early Raptor Lake chips

Along with this news, Intel has also made a few other statements regarding chip instability to the press and public over the past 48 hours that also deserve some attention.

First of all, before Intel’s official analysis of the root causes of the Raptor Lake instability issues on desktops, one possibility that couldn’t be ruled out at the time was that the root cause of the problem was a hardware flaw of some kind. And while the answer to that question turned out to be “no,” there’s also a rather important “but.”

It turns out that Intel did There’s an early manufacturing defect in the upgraded version of Intel’s 7 process node used to build Raptor Lake. According to a post by Intel on Reddit this afternoon, an “oxidation manufacturing issue” was fixed in 2023. However, despite the suspicious timing, Intel says this is separate from the microcode issue that’s been causing instability issues in Raptor Lake desktop processors to this day.

Short answer: We can confirm that there was a manufacturing issue with oxidation (resolved in 2023), but it is unrelated to the instability issue.

Long answer: We can confirm that the via oxidation manufacturing issue affected some early 13th Gen Intel Core desktop processors. However, the issue was addressed through manufacturing improvements and displays in 2023. We have also looked into instability reports on 13th Gen Intel Core desktop processors and analysis to date has determined that only a small number of instability reports can be linked to the manufacturing issue.

For the instability issue, we are providing a microcode fix that addresses high voltage exposure, which is a key part of the instability issue. We are currently validating the microcode fix to ensure that the instability issues for 13th and 14th generations are resolved.

– Intel Reddit post

Ultimately, Intel claims to have caught the issue early on, and that only a small number of Raptor Lake chips were affected by the via oxidation manufacturing defect. That won’t do much to reassure Raptor Lake owners who are already worried about the instability issue, but it’s at least helpful that the issue is being documented publicly. Typically, these kinds of early-stage issues don’t get mentioned because even in the best-case scenario, some chips inevitably fail prematurely.

Unfortunately, Intel’s disclosure here doesn’t offer any more details on the nature of the problem, or how it manifests itself beyond greater instability. But ultimately, as with the microcode voltage issue, the solution for all affected chips will be to return them to Intel for a replacement.

Laptops not affected by the Raptor Lake microcode issue

Finally, before the two previous statements, Intel also released a statement to Digital trends and a few other tech websites over the weekend, in response to accusations that Intel’s 13th The 2nd generation Core mobile processors were also affected by what we now know to be a microcode flaw. In its statement, Intel refuted the allegations, saying that laptop chips do not suffer from the same instability issue.

Intel is aware of a small number of reports of instability on 13th/14th Gen Intel Core mobile processors. Based on our extensive analysis of instability issues reported on 13th/14th Gen Intel Core desktop processors, Intel has determined that mobile products are not susceptible to the same issue. Symptoms reported on 13th/14th Gen mobile systems, including system freezes and crashes, are common symptoms of a wide range of potential software and hardware issues. As always, if users experience issues with their Intel processor-based laptops, we encourage them to contact the system manufacturer for assistance.
-Intel Representative to Digital Trends

Instead, Intel attributed the laptops’ instability issues to typical hardware and software issues – essentially saying that they don’t experience high instability issues. It’s unclear whether this statement explains the oxidation manufacturing issue via (largely because all 13th The Gen Core Mobile parts are Raptor Lake), but this is consistent with Intel’s statements from earlier this year, which always explicitly cited instability issues as desktop problems.

Leave a Reply

Your email address will not be published. Required fields are marked *