On July 19, 2024, a CrowdStrike update triggered a global IT outage that struck hospitals, airlines, and even banks. As we arrive at the one-year anniversary of the incident, CIOs have the opportunity to reflect on their approach to cyber resilience.
While the CrowdStrike outage was remarkable for the scale of disruption, IT outages are a common occurrence. And as the IT ecosystem becomes more complex and interconnected, the possibility of another major incident like this is ever-present. A 2024 PagerDuty survey found that 88% of IT and business executives expected to see another major incident as large as last July’s outage within the next year.
In the face of anticipated service disruptions in the future, have CIOs changed how they approach resilience in their organizations?
‘Never Waste an Outage’
While the CrowdStrike outage swept through a swath of industries and companies, there were plenty of organizations that were not affected. Regardless of how close CIOs were to the outage — in the thick of it or an outside observer — there are lessons to be learned.
“There are customers that we spoke to that felt like it was a ‘never waste an outage’ kind of situation where you go and try and learn from it,” Eric Johnson, CIO of PagerDuty, a digital operations management company, tells InformationWeek. “We saw a lot of people rethinking the way they were going to be managing this in the future.”
CIOs and their teams can use an outage to refine their processes. How could they be more resilient next time? Are there opportunities to improve incident response and business continuity?
Beating the Drum on Resilience
The CrowdStrike outage was a stark reminder of how little control organizations have in preventing an outage like this. When something goes wrong with their supply chain, they can’t stop it. They can only react.
“This was the best example of you couldn’t see this coming,” says Amanda Fennell, CIO and CISO at Prove, a digital identity verification platform. “It shifted the conversations from, ‘Can we stop everything?’ to ‘Okay, how fast can we recover?’”
Resilience and recovery over prevention has been a favored mantra in cybersecurity for quite some time, but that shift is still a work in progress. The PagerDuty survey found that 86% of executives think that they had been prioritizing security over preparedness for service disruptions.
In Fennell’s experience, some CIOs took the CrowdStrike incident to heart and set out to improve resilience. Others, she believes, have not.
“There’s a bucket of people who … learned specifically how to approach things as a security officer and as an information officer, and as a consequence, they do the same lift and shift program they’ve done in every program that they’ve been in,” she says. “I don’t know that group of people has really grown from it or is going to change anything.”
The CIOs that want to be more resilient are going to be thinking about single points of failure and what they can do to address those.
“It’s just going to be a trend that is just going to be part of a CIO’s job,” says Johnson. “When it happens, how do you react to it? As opposed to thinking that somehow, it’s never going to happen to you.”
Know Your Most Critical Vendors
CrowdStrike is a critical vendor for a lot of customers. Following the outage, it released a root cause analysis and took steps to prevent the same kind of incident from unfolding.
“Cyber resilience starts with stopping breaches, and our shared focus on raising the bar after July 19 is why so many customers and partners have stayed — and continue to grow — with CrowdStrike,” says Justin Acquaro, the company’s CIO, in an emailed statement.
But CrowdStrike is far from the only critical vendor in today’s complex world of third-party dependencies and supply chain risk. The next major outage could stem from any number of vendors.
“At the end of the day, the further we get in technology, the higher our dependency on it, the further we’re going to fall,” says Fennell.
Identifying their most critical vendors–particularly those that represent potential single points of failure — can help CIOs focus their resiliency efforts. After all, resources are limited, and they cannot plan for every possible scenario.
Once you know who your most critical vendors are, it is a good idea to look at them through the lens of third-party risk management. Review contracts and SLAs. Talk to vendors and ask them to walk you through their risk mitigation strategies.
“It’s upon the person who’s paying for it — the buyer, the consumer — to demand that transparency and validate the resilience claims,” says Fennell.
Test, Test, Test
Any outage, the CrowdStrike incident, the ones that followed, and the others yet to happen, are a reminder for CIOs to reevaluate their incident response and business continuity plans.
“You want to get to the most critical systems and processes that need to be recovered in a short amount of time period and then adjust your business continuity program to respond,” says Thomas Phelps, CIO and SVP of corporate strategy at document management company Laserfiche.
Those plans should be like living, breathing organisms that adapt to change. They cannot sit forgotten until an outage actually happens. CIOs need to envision potential scenarios and put those plans to the test.
What happens if a critical vendor causes an outage? Do enterprises have another service they can switch to that keeps operations up and running? Do CIOs have a way to communicate with key stakeholders, even if their communications system is taken down by the outage?
Resilient enterprises are not going to leave the answers to those questions up to chance. Resiliency-minded CIOs work to have the right processes, and importantly, the right people ready to respond when an outage does happen.
“How often are you pressure testing that the right people understand their role and responsibility?” Johnson asks.
CIOs can set a regular schedule for tabletop exercises to see how their resilience plans hold up. That might mean quarterly tests. Fennell, who has a background in tabletop roleplaying game Dungeons & Dragons, relishes the opportunity for more frequent controls and processes tests.
“It’s like going to the gym,” she says. “If you test it often, you’re strong and you’re ready.”
Build Relationships
CIOs live in a technical world. They need to understand how IT systems work, how the different components are connected, and the weak spots. But they are also business leaders. Good business is built on good relationships.
When an outage happens, CIOs need to have strong ties with other departments, not just within IT. Phelps stresses how important it is to work with customer-facing teams to develop an effective communications strategy.
“When a disaster strikes, make sure that there are playbooks in place with the communications plan to be able to reach out to your customers, to your end users, to your employees, to your other stakeholders and to the public markets to make sure that the right messages are conveyed,” he says.
CIOs can also look outside of their organizations to build valuable relationships. Phelps looks beyond SLAs and contracts and connects with people working at Laserfiche’s most critical vendors.
“[I] make sure that I’ve got C-level relationship with them to have a point of escalation for any type of concerns or questions or opportunities to improve their product,” he explains.
Having the right relationships can be invaluable for CIOs who have so much on their plates: security, resilience, and much more.
“There are so many things going on in the world of technology today around AI and so many other things,” says Johnson. “It’s probably one of the most exciting times to be a CIO. And it’s also probably one of the most difficult times to be a CIO that I can recall.”