How to react to a network outage

By Lora O’Haver

Recent DDoS (distributed denial-of-service) attacks have everyone on edge, from DNS hosting providers to the individual organization fearing they might be caught in the next denial-of-service campaign. It’s a valid concern considering the cost of unplanned data center outages have increased 38 percent in five years to nearly $9,000 a minute or about half a million dollars an hour.

Unfortunately, network outages don’t need to be instigated by malicious outside forces in order to occur. Configuration choices, new releases, or the cumulative impact of changes made over many years can lead to catastrophic failures as well. Examples of this are common. Just this summer, Southwest Airlines experienced a router failure that caused the cancellation of about 2,300 flights in four days. The router took down several Southwest Airlines systems and the outage continued uninterrupted for about 12 hours when the backup systems didn’t work as expected. Software can be a cause for them too.  Last year, the New York Stock Exchange experienced an outage caused by a software update, crippling the exchange platform in the longest technology-related disruption in recent memory.

An outage can strike anywhere and even a minor internal problem can create a ripple effect that can cause widespread disruption—costing an organization money and consumer trust. How can we mitigate the risk of such an outage happening to your organization? 

Why It Happens

Single Points of Failure: Deploying a device on the network without failsafe protection of other network components, can lead to a lot of headaches. In cases where a failure does occur, like the router incident above, the entire traffic pathway can fail and result in a complete network outage. In other words, a single device can have a huge impact on business operations and fixes can be hard to implement when failures occur, as it can often be difficult to identify and isolate any individual issue in a short period of time.

The Human Element: But it’s not just technology that is at fault, humans are too. Last year, Avaya found that 81 percent of IT pros cited human error (e.g. configuration mistakes) had taken them offline. Another human element is conflicting priorities between teams that can delay necessary action—such as software updates—from being taken in a timely manner, leaving vulnerabilities exposed.

Environmental Issues: Finally, you can’t predict random occurrences. For example, if your HVAC system fails, the data center room can become overheated, leading to potential damage and failure of any system deployed there.

Taking Steps to Prevent the Problem

The first step is accepting that you cannot prevent all network outages, so it is critical to develop a disaster recovery and business continuity plan. What is an acceptable amount of risk or downtime? Being specific about what risks are acceptable and which are not allows for better prioritization.

Second, it is important to employ the right setup for your network. No device should be deployed without providing an alternate path in the event of a failure. Something as simple as an external bypass switch can maintain availability when deployed in front of network devices.

Additionally, you need resilient network security that provides continuous traffic inspection and recovers from any outage in an acceptable time frame, to minimize exposure. For security, that means having complete visibility of traditional and cloud traffic and a security fabric that protects the network from any outage in your security tools and ensures the maximum amount of traffic will be inspected.

Once the necessary steps have been taken, test, test, test. Network and devices should be continuously tested as part of standard IT operations. This should be done with loads that reflect real world scenarios, particularly in the event of application and network changes.

The incidents I describe above, ones in which a IT problem rapidly becomes a major business issue, are almost certain to happen to any business at one point or other. As such, they should not be any less a cause for concern than a denial-of-service attack. Understanding the sources of common network disruption, and how they can be addressed in advance, will help in avoiding major fallout—from lost revenue to lost customers. Are you doing this now? If not, it is time to make this a top priority. 

Lora O’Haver is Solutions Marketing Manager at Ixia

Business Review Australia's January issue is now live. 

Follow @BizReviewAU and @MrNLon on Twitter. 

Business Review Australia is also on Facebook.


Featured Articles

Hybrid live event shaping the future of Sustainability & ESG

Sustainability LIVE London returns for a two-day, multi-track conference programme featuring inspirational ESG speakers, debates and discussions

Nine must-attend sustainability events for business leaders

From London to Abu Dhabi, Singapore to San Diego, these sustainability-focused events are designed to help business leaders action their ESG goals

Daniel Weise of BCG on new supply chain and procurement book

Daniel Weise, global leader of Boston Consulting Group’s procurement business line, on the timely publication of his new book, Profit From The Source

Attract and retain talent with flexible working and benefits

Human Capital

Nurturing the next generation of women leaders in Africa

Leadership & Strategy

5 Mins With: Cybersecurity expert Ariel Parnes of Mitiga