How to react to a network outage

By Lora O’Haver

Recent DDoS (distributed denial-of-service) attacks have everyone on edge, from DNS hosting providers to the individual organization fearing they might be caught in the next denial-of-service campaign. It’s a valid concern considering the cost of unplanned data center outages have increased 38 percent in five years to nearly $9,000 a minute or about half a million dollars an hour.

Unfortunately, network outages don’t need to be instigated by malicious outside forces in order to occur. Configuration choices, new releases, or the cumulative impact of changes made over many years can lead to catastrophic failures as well. Examples of this are common. Just this summer, Southwest Airlines experienced a router failure that caused the cancellation of about 2,300 flights in four days. The router took down several Southwest Airlines systems and the outage continued uninterrupted for about 12 hours when the backup systems didn’t work as expected. Software can be a cause for them too.  Last year, the New York Stock Exchange experienced an outage caused by a software update, crippling the exchange platform in the longest technology-related disruption in recent memory.

An outage can strike anywhere and even a minor internal problem can create a ripple effect that can cause widespread disruption—costing an organization money and consumer trust. How can we mitigate the risk of such an outage happening to your organization? 

Why It Happens

Single Points of Failure: Deploying a device on the network without failsafe protection of other network components, can lead to a lot of headaches. In cases where a failure does occur, like the router incident above, the entire traffic pathway can fail and result in a complete network outage. In other words, a single device can have a huge impact on business operations and fixes can be hard to implement when failures occur, as it can often be difficult to identify and isolate any individual issue in a short period of time.

The Human Element: But it’s not just technology that is at fault, humans are too. Last year, Avaya found that 81 percent of IT pros cited human error (e.g. configuration mistakes) had taken them offline. Another human element is conflicting priorities between teams that can delay necessary action—such as software updates—from being taken in a timely manner, leaving vulnerabilities exposed.

Environmental Issues: Finally, you can’t predict random occurrences. For example, if your HVAC system fails, the data center room can become overheated, leading to potential damage and failure of any system deployed there.

Taking Steps to Prevent the Problem

The first step is accepting that you cannot prevent all network outages, so it is critical to develop a disaster recovery and business continuity plan. What is an acceptable amount of risk or downtime? Being specific about what risks are acceptable and which are not allows for better prioritization.

Second, it is important to employ the right setup for your network. No device should be deployed without providing an alternate path in the event of a failure. Something as simple as an external bypass switch can maintain availability when deployed in front of network devices.

Additionally, you need resilient network security that provides continuous traffic inspection and recovers from any outage in an acceptable time frame, to minimize exposure. For security, that means having complete visibility of traditional and cloud traffic and a security fabric that protects the network from any outage in your security tools and ensures the maximum amount of traffic will be inspected.

Once the necessary steps have been taken, test, test, test. Network and devices should be continuously tested as part of standard IT operations. This should be done with loads that reflect real world scenarios, particularly in the event of application and network changes.

The incidents I describe above, ones in which a IT problem rapidly becomes a major business issue, are almost certain to happen to any business at one point or other. As such, they should not be any less a cause for concern than a denial-of-service attack. Understanding the sources of common network disruption, and how they can be addressed in advance, will help in avoiding major fallout—from lost revenue to lost customers. Are you doing this now? If not, it is time to make this a top priority. 

Lora O’Haver is Solutions Marketing Manager at Ixia

Business Review Australia's January issue is now live. 

Follow @BizReviewAU and @MrNLon on Twitter. 

Business Review Australia is also on Facebook.

Share

Featured Articles

Ex Infosys President Ravi Kumar is the CEO Cognizant needs

Plagued by underperformance and multiple executive exits, Cognizant has replaced its CEO – what will ex Infosys President Ravi Kumar bring to the table?

How India is bucking the global dealmaking downturn

M&A deal volume and value reach record highs in India in 2022, despite slower dealmaking globally – as scope activity and acquisitions in renewables surge

Create C-suite space for the Chief Transformation Officer

Responsible for driving growth and change, the Chief Transformation Officer is the latest addition to the C-suite as business undergoes major change

12 AI predictions for the enterprise in 2023 – Dataiku

Technology

Welcome to the new breed of private members' clubs

Leadership & Strategy

Welcome to the New Age of the CISO

Leadership & Strategy