Security

Patches

CrowdStrike meets Murphy's Law: Anything that can go wrong will

And boy, did last Friday's Windows fiasco ever prove that yet again


Opinion CrowdStrike's recent Windows debacle will surely earn a prominent place in the annals of epic tech failures. On July 19, the cybersecurity giant accomplished what legions of hackers could only dream of – bringing millions of Windows systems worldwide to their knees with a single botched update.

As a veteran tech journalist, I've seen my fair share of software snafus. Heck, I went hand-to-hand with the grandpa of all network blow-ups – the Morris Worm – in 1988 when I was a sysadmin. Even so, I can't help but marvel at the sheer scale and impact of this blunder. CrowdStrike, a company valued at over $70 billion and trusted by countless organizations to protect their digital assets, inadvertently became the source of one of the largest IT outages in history.

The fallout from this debacle was staggering – thousands of flights canceled, healthcare services disrupted, and 911 systems knocked offline. It's a stark reminder of how deeply intertwined our digital infrastructure has become and how vulnerable it can be to a single point of failure.

Let's break down the cascade of errors that led to this fiasco.

In the beginning, Microsoft enabled CrowdStrike's Falcon security software to run at the zero level of the Windows kernel. Any problem at this low level will likely cause a Blue Screen of Death (BSOD). Meanwhile, Microsoft reportedly wants to blame the European Commission – no, really – for requiring it to grant third-party software vendors this level of access.

You know, I think with all of Microsoft developers and lawyers, they could come up with a better, legal way to avoid this kind of foul-up and let software companies compete equally. It's not rocket science. 

Microsoft doesn't want any of the blame, but it deserves some of it. For far too long, we've placed too many vital IT eggs in the Windows basket. When that basket falls, so does much of the economy.

Returning to CrowdStrike, the company claims a "logic error" in a routine sensor configuration update caused the meltdown. But for a company of CrowdStrike's caliber, such a fundamental mistake is inexcusable. This wasn't some obscure edge case – it was a critical failure in its core functionality.

It wasn't even a code problem. This wasn't a software update per se. The villain of this piece was a Falcon configuration file called a channel file. One simple file containing what should have contained data to update a security setting ended up causing a cascade of one BSOD after another.

How did such a catastrophic bug pass quality assurance? CrowdStrike admitted: "Due to a bug in the Content Validator, one of the two Template Instances passed validation despite containing problematic content data [and] were deployed into production." When your software has deep hooks into millions of Windows systems, your testing should be bulletproof. Clearly, CrowdStrike's testing protocols need a massive overhaul.

We also now know, as security expert Kevin Beaumont pointed out on Mastodon: "The key takeaway – channel updates are currently deployed globally, instantly." I always send major patches to all my customers simultaneously and wait to see what happens next. Doesn't everyone? Who are these people, and why does anyone let them do security work?

There's a simple concept called canary testing. You may have heard of it. Like the proverbial canary in a coal mine, you first test whether a new space – or program – is safe by trying it on a canary – or a small group of users – and then, if all's well, let everyone else in.

Let's not forget that CrowdStrike's initial response was slow and inadequate. Users were left scrambling for answers while critical infrastructure faltered. Even today, almost a week later, I still have friends having trouble with their Delta flights.

This serves as a sobering wake-up call for the rest of us in the tech industry. As we rush to secure our systems against external threats, we must not overlook the potential for self-inflicted wounds. Rigorous testing, fail-safe mechanisms, and a healthy dose of humility are essential when dealing with critical systems.

In the end, CrowdStrike's Windows fiasco is a textbook example of Murphy's Law in action – anything that can go wrong will go wrong. It's a painful lesson but one that we would all do well to learn from. After all, in cybersecurity, your next big threat might just be an update away. ®

Send us news
98 Comments

Windows 11 continues slog up the Windows 10 mountain

Almost three years on and many customers have yet to make the move

CrowdStrike's meltdown didn't dent its market dominance … yet

Total revenue for Q2 grew 32 percent

House to grill CrowdStrike exec on epic IT meltdown... no, not the CEO

VP Adam Meyers to testify about that faulty software update which ruined July and some of August

How did a CrowdStrike file crash millions of Windows computers? We take a closer look at the code

Maybe next time some staged rollouts? A bit of QA too?

CrowdStrike deja vu as 'performance issue' leaves systems sluggish

Not related to the massive outage in July, security biz spokesperson told us

Microsoft decides it's a good time for bad UI to die

Set the Control Panel for the heart of the Sun

The Windows Control Panel joins the ranks of the undead

As users wail, Microsoft tweaks its text to drop the word 'deprecated'

Ex-Windows boss who tried to save the Start Menu now Shopify tech wizard

Time to make e-commerce great again instead?

Microsoft closes Windows 11 upgrade loophole in latest Insider build

Pretending you're a server won't stop the hardware police

Microsoft rolls out one Teams app to rule them all

That annoying requirement to switch between home and work accounts has finally gone

EU gave CrowdStrike the keys to the Windows kernel, claims Microsoft

Was a 2009 agreement on interoperability to blame?

Post-CrowdStrike, Microsoft to discourage use of kernel drivers by security tools

Now there's an idea – parsing config data in user mode