Tuesday, January 18, 2011

Murphy is alive (and holds a grudge)

A live test in a production environment is often the final stage of a purchasing project. Well; I just got home from one of those tests.

As the person ultimately responsible for conducting the test, I was initially okay with it being conducted at first business. The potential for failure seemed to be negligible and we had the ability to put plenty of manual overrides in place in case things did go wrong.

However, our senior network network manager was not so sure. "Let's not do that," was his assessment.

Needless to say, he was right.

I'm glad that I followed his advice this time around.

After we failed over to put the device inline, and then failed back to normal, it turned out that our newest device's fail-over failed, its pass-through did not pass through, and its updates did not update. Traffic was fully blocked. Not a single packet got through.

Since I had my network person also present, we immediately failed over manually and there was little or no interruption, but the lesson is clear: operations that have the potential to disrupt must simply be done outside operational hours.

Of course, all of the things that failed did appear to work when the device was passive; as soon as we went in-line, Murphy turned out to be alive and holding some form of grudge.

This is why, as a responsible manager, you should really listen to what technically knowledgeable people have to say, and really take their advice to heart. I am not advocating to always blindly follow technical advice, but making sure that you are aware of what the technical people have to say and really consider the consequences if their doom scenarios come true is  not something that is open for discussion.

Sure, it was an inconvenience to wait until the close of business, but had we not done that, we would have run the potential to cause some serious disruptions, and that would also have been Bad.

The information security department never sleeps.