BY DR SOHA MAAD
Introduction
This article sheds lights on the global Information Technology (IT) outage that happened in July 2024. The cause for the IT outage is examined and its global impact is assessed. The monopoly from big technology (BigTech) and the wrong European Union (EU) regulation blamed for the outage are considered. The World Economic Forum call for enhanced cyber resilience is stressed. General causes for IT crashes are presented and various strategies and measures to prevent future IT outage are outlined. The article concludes with the call of the Union of Arabic Banks for the formation of a universal ecosystem to prevent global IT outage and the establishment of an independent body for the control and development of technology at the global level.
The 2024 Global IT Outage
On 19 July 2024, American cybersecurity company CrowdStrike distributed a faulty update to its Falcon Sensor security software that caused widespread problems with Microsoft Windows computers running the software. As a result, roughly 8.5 million systems crashed and were unable to properly restart in what has been called the largest outage in the history of information technology and historic in scale.
The outage disrupted daily life, businesses, and governments around the world. Many industries were affected including airlines, airports, banks, hotels, hospitals, manufacturing, stock markets, broadcasting, gas stations, retail stores, as well as governmental services, such as emergency services and websites. The worldwide financial damage has been estimated to be at least US$10 billion.
Within hours, the error was discovered and a fix was released, but because many affected computers had to be fixed manually.
About Croudstrike the Company That Caused the Global It Outage
A software update from a single cybersecurity company, CrowdStrike based in the United States, was the root cause of the chaos, underlining the fragility of the global economy and its dependence on computer systems to which relatively few people give a passing thought. Software updates are a critical function in society to keep computers protected from hackers. But the update process itself is crucial to get right and to safeguard from tampering.
CrowdStrike is one of the global top cybersecurity companies. The update to the Falcon software triggered a malfunction that disabled parts of the computer systems and software like Microsoft Windows. Three days after the incident, CrowdStrike reported that a significant number of the devices are back online and operational.
The Cost of The Global IT Outage
The health care and banking sectors were the hardest hit by CrowdStrike’s global IT outage, with estimated losses of $1.94 billion and $1.15 billion, respectively.
Fortune 500 airlines such as American and United were the next most affected, losing a collective $860 million.
The outage may have cost Fortune 500 companies as much as $5.4 billion in revenues and gross profit, not counting any secondary losses that may be attributed to lost productivity or reputational damage. Only a small portion, around 10% to 20%, may be covered by cybersecurity insurance policies.
Fitch Ratings, one of the largest United States US credit ratings agencies, said that the types of insurance likely to see the most claims stemming from the outage include business interruption insurance, travel insurance and event cancellation insurance.
According to Fitch rating agency, this incident highlights a growing risk of single points of failure. Single points of failure are likely to increase as companies seek consolidation to take advantage of scale and expertise, resulting in fewer vendors with higher market shares.
The eye-popping damage estimates underscore how a preventable mistake at one of the world’s most dominant cybersecurity firms has had cascading effects for the global economy and may prompt more calls for CrowdStrike to be held accountable.
Impact on Banking and Financial Sector
Microsoft and CrowdStrike stocks fell as a result of the outage. CrowdStrike’s stock fell more than 11% on 19 July, although Microsoft stock was down less than 1%.
Banks that were affected included Chase, Bank of America, Wells Fargo, U.S. Bank, Capital One and Charles Schwab in the US, RBC and TD Bank in Canada, Capitec Bank and other South African banks, and several banks in the Philippines, including RCBC, Metrobank, LandBank, BDO, UnionBank, BPI, and PNB.
E-wallets such as Maya and GCash also experienced problems in the Philippines. The website and mobile banking application of DenizBank in Turkey could not be accessed. Visa was affected. Numerous Singaporean companies, including Singapore Exchange (SGX) and DBS Bank, reported various levels of service difficulties.
In India, the Reserve Bank of India said that only 10 banks and NBFCs were affected by the outage.
In Arab countries, few banks use CrowdStrike tools and many banks’ critical systems do not run on the cloud. Hence, most Arab banks were unaffected.
The London Stock Exchange, while operating normally, was unable to push news updates to its website.
The Need for Systemic and Digital Resilience
The world economic forum issued the cyber resilience alarm heard around the world. The July global cyber outage caused an estimated $1 billion in global costs and was a signal sent globally to invest in cyber resilience.
There is a major lesson to be learned from the outage. We need to prepare for such incidents in ways that we can maintain the resilience of businesses and services. Whether caused by the intentional actions of an adversary or the innocent mistakes of well-intentioned actors, businesses and governments need to be resilient to cyberattacks and other cyber failures that can lead to major disruptions of business processes.
The incident highlights the need to shift our perception of cybersecurity from a mere IT issue to the broader concept of cyber resilience as an integral part of business resilience. In the face of a cyberattack, businesses should be able to recover fast from an incident and resume business as usual.
To be cyber resilient, organizations need to first and foremost identify business-critical processes and ensure the continuity of those even during cyber incidents. This has to involve continuous conversations with business leadership to ensure alignment with the overall business strategy while conducting real-time prioritization.
We need also to think beyond cyber and business resilience and look at the big picture encompassing systemic resilience. As cyber threats become more advanced, businesses increasingly rely on a few sophisticated security software providers. This reliance creates a single point of failure, where a flaw in one system can lead to global cascading effects. Balancing centralized, highly protected architectures with decentralized, lower-impact systems is a difficult challenge.
Cybersecurity leaders from across the world should develop a common understanding of business cyber resilience and collect and systemize experience on cyber resilience tradecrafts that matter. As online and cyber infrastructures become ever more complex, interconnected and central to all sectors of business and society, the importance of cyber resilience will only continue to rise.
Senior White House tech and cybersecurity official highlighted the risks of consolidation and advised that we need to really think about digital resilience not just in the systems we run but in the globally connected security systems. The chaotic scenario that played out the IT outage did not involve a malicious actor but a lack of digital resilience.
Blaming the Monopoly Power of Big Tech
Numerous Fortune 500 companies use CrowdStrike’s cybersecurity software to detect and block hacking threats. Computers running Microsoft Windows crashed because of the faulty way a code update issued by CrowdStrike is interacting with Windows.
CrowdStrike, a multibillion-dollar firm, has expanded its footprint around the world in its more than decade of doing business. Many more businesses and governments are now protected from cyberthreats because of this, but the dominance of a handful of firms in the anti-virus and threat-detection marketplace creates its own risks, according to experts.
Experts argued that without diversity of cybersecurity providers there is fragility in technology ecosystem. Winning in the marketplace can aggregate risk, and then all consumers and companies alike bear the costs.
Blaming the EU Regulation
Microsoft stated that the European Union (EU) is to blame for the world’s biggest IT outage following a faulty security update. The 2009 antitrust agreement with the European Union forced Microsoft to sustain low-level kernel access to third-party developers. The 2009 agreement insisted on by the European Commission meant that Microsoft could not make security changes that would have blocked the update from cybersecurity firm Crowdstrike that caused an estimated 8.5 million computers to fail.
General Causes of IT Crash
IT crashes can be caused by a variety of factors, including both hardware and software issues. Common causes include:
- Hardware issues such as failure of Random Access Memory (RAM) or Hard Disk, overheating and weak or fluctuating power supply
- Software issues such as buggy or corrupt device drivers, operating system bugs, and third-party Software that are poorly designed.
- Other causes may include virus or malware. Malicious software can disrupt normal operations and cause crashes. Invalid Memory Access from programs trying to access forbidden memory locations can cause crashes. Buffer Overflow and overwriting memory can lead to crashes. Unhandled exceptions including errors that the system or application cannot handle can cause crashes.
Strategies To Prevent Future Global IT Outage
Building resilience is essential. Businesses and governments need to understand their exposures. CrowdStrike and Microsoft are both reputable. But whenever an organisation is too reliant on an individual provider, there is always a risk, however small, of failures hitting its wider processes.
Once vulnerabilities are mapped, organisations need to build redundancy into their operations and develop contingency plans to ensure critical functions can still work in the worst-case scenarios. This includes diversifying their IT infrastructure by having more than one cyber security, operating system, or cloud provider.
Closer collaboration between the public and private sector is essential. Businesses benefit from accessing secure digital networks, as well as the public services that rely on them. This means there should be a common interest in sharing information on breaches, vulnerabilities, and stress tests. The cost of switching between IT providers, interoperability, and the ability of new entrants to compete also needs effective monitoring. But co-operation between regulators and tech firms is important to ensure any regulations are targeted, and do not stifle innovation.
Single points of failure also lurk more broadly in our globalised and highly networked economies. The pandemic highlighted how many businesses had become over-reliant on China-linked supply chains that supported their uber efficient just in time delivery models.
The logic of mapping, contingency building, and collaborating holds for mitigating most concentrated risks. Building resilience into physical and digital economic systems is essential, and should not be postponed. This will come at a cost, but will bring the benefit of insuring against even costlier threats.
General strategies to prevent IT crashes include:
- Ensuring hardware reliability: by using quality components and high-quality and reliable hardware to minimize failure rates, implementing redundant systems such as RAID for storage, backup power supplies to ensure continuity in case of hardware failure, and conducting scheduling regular maintenance checks to identify and replace failing components.
- Ensuring software stability: by executing regular updates thus keeping all software, including operating systems and applications, up to date with the latest patches and updates, and conducting compatibility testing to test new software and updates in a controlled environment before deploying them widely, and choosing reliable software that are well-reviewed and compatible to avoid conflicts and crashes.
- Adopting various security measures: this involves using antivirus and anti-Malware to protect against malicious attacks, using firewalls and intrusion detection systems to prevent unauthorized access, and conducting regular security audits to identify and address vulnerabilities.
- Conducting monitoring and diagnostics tools to continuously monitor system performance and detect issues early, regularly reviewing system logs to identify potential problems before they cause crashes, and conducting stress testing to ensure systems can handle peak loads.
- Backup and Recovery: Maintaining regular backups of critical data to ensure quick recovery in case of a crash, developing and regularly updating a disaster recovery plan to minimize downtime and data loss.
- User Training and Best Practices: Educating users on safe computing practices and how to avoid common pitfalls, keeping detailed documentation of the IT infrastructure and procedures, and scheduling regular maintenance checks to keep everything running smoothly.
- Proactive Management: by using predictive analytics to anticipate and address potential issues before they cause crashes, and implementing a structured change management process to ensure smooth transitions and minimize disruptions.
These strategies can significantly reduce the risk of crashes and ensure a more stable and reliable environment.
Towards a Universal Ecosystem To Prevent Global IT Outage
A seemingly routine software update can reap such worldwide chaos should serve as a wake-up call. Crashes, hacks and data breaches are a mounting threat as the global economy becomes more digitalised and interconnected. Computers and the internet already underpin everything from stock exchanges and electric vehicles to central heating.
Creating a robust ecosystem to prevent IT crashes involves integrating various tools, practices, and strategies to ensure stability and resilience.
The Union of Arabic Banks calls for the formation of a universal ecosystem to prevent global IT outage and the establishment of an independent body for the control and development of technology at the global level.
The ecosystem will be governed by the legislative authority to audit the performance and operations of major technology companies to prevent the misuse of technology, hedge against the risks of IT outage, monitor performance and set standards to design technology according to human needs and not to destroy it, support the development of technology according to specified standards, and enforce digital infrastructure cyber resilience.