CrowdStrike software outage causes global chaos

Global Chaos

A major internet outage affecting Microsoft disrupted flights at airlines and airports worldwide, with problems persisting hours after the technology company began addressing the issue. The wide-reaching outage highlighted the fragility of a digitized world heavily reliant on a few major technology providers. Air travelers became the face of the widespread technology outage as they posted pictures of crowded airports on social media.

In the U.S., airlines including American Airlines, Delta Air Lines, United Airlines, Spirit Airlines, and Allegiant Air experienced grounded flights for varying lengths of time on Friday morning. The outage affected systems crucial for operations, such as those used to check in passengers and calculate necessary aircraft weight for takeoff. Several airlines issued waivers, allowing customers to change their travel plans.

Health care providers across the U.S., Canada, and England also faced significant disruptions. Harris Health System in Houston suspended hospital visits, and elective procedures were canceled and rescheduled.

The New York-based Memorial Sloan Kettering Cancer Center paused procedures requiring anesthesia, while Mass General Brigham in Massachusetts canceled all non-urgent surgeries, procedures, and medical visits for the day.

Despite the widespread issues, emergency departments and some systems like the Cleveland Clinic and the Cedars-Sinai Health System in Los Angeles continued to provide care. Customers looking to place mobile orders at Starbucks found themselves unable to do so on Friday. The coffee shop chain apologized and assured that it was serving customers in the majority of its stores and drive-thrus, despite the online and mobile ordering issues.

The outage also caused significant delays at border crossings. The San Ysidro Port of Entry saw pedestrians waiting for up to three hours, while even those approved for the U.S. Customs and Border Protection’s Trusted Traveler program waited up to 90 minutes. In Canada, Windsor Police reported long delays at the Ambassador Bridge and the Detroit-Windsor tunnel.

The source of the global tech disruption was a faulty software update from a cybersecurity firm to Microsoft Windows computers. The firm, a U.S. cybersecurity company providing software to thousands of companies worldwide, said it was working on a fix and urged patience as solutions were rolled out across affected industries. The global tech outage served as a stark reminder of the interconnected and fragile nature of modern digital infrastructure.

From grounded flights and delayed medical procedures to border crossing delays and interrupted mobile orders, the effects were widespread and deeply felt across numerous sectors. On July 19, 2024, a routine content configuration update released by CrowdStrike resulted in Windows system crashes (Blue Screen of Death, BSOD) for hosts running the Falcon sensor. This incident impacted Windows hosts running sensor version 7.11 and above that were online between 04:09 UTC and 05:27 UTC.

Mac and Linux systems were unaffected. As part of its dynamic threat protection measures, CrowdStrike frequently releases updates to its Falcon sensor. On July 19, 2024, a Rapid Response Content update was released to gather telemetry on potential security threats.

However, the update contained an undetected error that led to Windows operating system crashes. CrowdStrike delivers security updates through two main content types: Sensor Content and Rapid Response Content. The issue stemmed from a faulty Rapid Response Content update.

Critical impact of software glitch

Rapid Response Content is designed to perform behavioral pattern-matching operations on the sensor dynamically, unlike Sensor Content which is bundled with the sensor software and undergoes extensive testing. The problematic update involved an InterProcessCommunication (IPC) Template Instance.

Due to a bug in the Content Validator, which is supposed to ensure the integrity of updates, this instance passed validation despite having faulty content data. When deployed, this faulty data caused the sensor to attempt an out-of-bounds memory read, leading to system crashes. To prevent similar incidents, CrowdStrike is implementing enhanced testing, improved validation, and error handling measures.

This incident underscores the importance of robust validation procedures and comprehensive testing. CrowdStrike is committed to improving these processes to enhance the reliability and security of its updates, ensuring clients remain protected with minimal disruptions. The boss of a cybersecurity firm responsible for worldwide IT outages admits it could be “some time” before all systems are back up and running.

While the software bug has been fixed, experts say the manual reboot of each affected Microsoft computer will take a significant amount of work. Thousands of flights have been canceled, with banking, healthcare, and payment systems all affected. In the UK, general practitioners have been struggling to access records, pharmacies have been hit, and TV channels knocked off the air.

A massive tech failure has caused travel chaos around the world, with banking and healthcare services also badly hit. Thousands of flights have been grounded because of the IT outage—a flaw which left many computers displaying blue error screens. There were long queues, delays, and flight cancellations at airports around the world, as passengers had to be manually checked in.

The cybersecurity firm has admitted that the problem was caused by an update to its antivirus software, which is designed to protect Microsoft Windows devices from malicious attacks. The firm has said it has fixed the update but admitted it could be “some time” before all systems are back up and running. More than 5,000 flights were canceled worldwide following the global IT outage, which also caused major delays.

While queues continue to grow at some airports, for many passengers whose flights have been disrupted, it’s now a matter of finding a spot to bed down for the night. Some U.S. airlines, including American, United, and Delta, issued ground stops—an air traffic control measure that slows or grounds aircraft at a given airport—earlier as IT outages caused disruption across the globe. The cybersecurity firm’s CEO says “nothing is more important” to him than the trust and confidence of the cybersecurity firm’s customers and partners.

Apologizing for the global IT outage, he said the firm is working closely with impacted customers and partners to ensure that all systems are restored. He urges people to only engage with official representatives, adding that “bad actors” will try to “exploit” the situation. Additionally, the outage affected UK supermarkets’ payment systems.

Some stores like Morrisons and Waitrose experienced issues processing payments, though these were largely resolved in a short time. We can also report that some billboards in New York City’s Times Square went blank during the IT outage. Photos show black screens and “the blue screen of death” in place of the vibrant advertisements that typically light up the area.

News crews who were at the firm’s offices in Austin, Texas earlier have hurried off to cover the human scenes of chaos and disruption elsewhere. The CEO of the cybersecurity firm has been on the airwaves on the business news channel CNBC, promising that it will ensure all customers recover from the outage.