This article relates to a point I’ve tried to make for years to other IT people and they mostly just
don’t seem to get it.
Complexity creates its own problems. The more complicated you make a network with redundancy and failover features, the fewer people understand it. Sure, I can create a network and server design so complex that only I can manage it but what then?
A lot of IT people are obsessed with cleverness. I am obsessed with things working right as simply as possible.
In general when I design a network or server infrastructure, I make some use of well-supported failover or redundancy features for absolutely critical infrastructure that the business can’t be without for even a few minutes.
For everything else I try to make it as simple as possible so that if something goes wrong, even a helpdesk worker can fix it while I am away.
For instance, instead of a complicated dual redundant everything stack of non-core switches connected to dual firewalls, if the business can stand three minutes of downtime why not just have a backup firewall pre-configured on a shelf? Or a switch?
Swap it and and done. Even a helpdesk worker can do this whereas if there’s 3-4 different connection paths, complicated routing and (as is usually the case) poor documentation, then not even junior sysadmins can do the job – especially if anything on the network changes and is not documented which happens all the time.
If it’s a holiday or someone knowledgeable is away – as has also happened to me more than once – if something in your very-complex network goes down and isn’t handled by failover or redundancy, the business might be down for hours instead of minutes.
This is the fault of the complexity directly. Networks and servers should be resilient in two dimensions: technologically and comprehensibility.
For anything in small- to medium businesses that isn’t absolutely critical (can’t stand more than five minutes downtime a year) simpler is often much better.