Not knowing what you don’t know that you don’t know

By | September 25, 2012

I stopped reading the New York Times article about data center power use after the second page, as I realized that the reporter(s) did not have the first damn clue what they were writing about.

I have built data centers. I run data centers. Not really large ones, at the moment, but the ones I run are getting bigger by the day. And running a small DC is in some ways harder than running a large one as you often lack the resources (financial and other) to make use of capabilities that large ones are able to tap. By the way of credentials, I am the US infrastructure manager for the largest company in the world who does what my company does (which is still pretty small), so I know just a little something about it.

The Times article could not be more clueless, really. I don’t regret not reading all of it. Running a data center – even a small data center – in a 24/7 operation is incredibly difficult. I am not saying that to make me seem noble, or my job harder. That’s just a fact. Here’s one reason why.

This isn’t just an incredibly inaccurate representation of the dedication and hard work of eng/ops everywhere in the computer industry, I know for a fact it’s also inaccurate in what regards to Facebook itself. I imagine Facebook engineers (and that of any other website really) reading this article, thinking about the times they’ve been woken up in the middle of the night to solve problems that no one has ever faced before, for which no one has trained them, because no university course and no amount of research prepares you for the challenges of running a service at high scale, and having to solve all that as fast as possible, regardless of whether it’s about making sure that someone can run their business, do their taxes, or that a kid halfway around the world can upload their video of a cat playing the piano.

I worked on a problem in our NYC DC for five hours on Friday, on a product that before I started I knew almost nothing about. The product is set up in a non-standard way (by a previous IT team), is unsupported by the vendor – though I cajoled them into assisting anyway – and is also very complicated to administer and to use.

So, let’s summarize. No one knows how to use it, no one knows how it is set up, the vendor doesn’t support it and the configuration status is unknown. Oh yeah, and who gets to fix it? Me, the “owner” of the data center.

Sound like a job you want to do? Yeah, didn’t think so. You’d be crazy to want to. (What does that say about me?)

I’ve sort of strayed away from the main point as I am still frustrated from Friday, but it’s always amazing to me how absolutely wrong articles can be when they are written by people who don’t know the field and who buy figures from clueless consultants*.

*As a rule, consultants are nearly always clueless.