Exercise works...
...No, although I have heard rumors that it might be a good idea too, I am not talking about the kind of exercise that involves push-ups or running a mile before breakfast. I am talking about exercising emergency plans before they are actually needed.
Today I was able to get the entire IT management team together to run through a tabletop exercise of the IT business continuity plan. The exercise was received very well and I think the participants not only had fun going through the scenario that I set out for them, I also think it boosted their confidence, worked towards increasing team spirit, and (of course) identified some areas in which we need to improve our processes.
Those of us who have played tabletop role playing games such as Dungeons and Dragons (go ahead, admit it!) will feel right at home in a tabletop business continuity exercise. The goal of a tabletop is to practice policies and procedures without having to break out the big guns, pull staff from their normal routine, or disrupt production processes. As a result, tabletops can be a relatively cheap, but still effective way to go over a scenario.
The chain of events was fairly simple. I set the story to emulate a small fire in a main server room to take out a core switch, which took with it remote connectivity and some telephone services. The fire was small and contained relatively fast, but it was not possible to do a full damage assessment as a result of a Fire Marshall declaring the site off-limits for investigation.
For myself, I had set the following training goals:
- Train the participants to recognize when 'events' turn into something bigger and some form of emergency operations need to be activated.
- Train the participants in the decision-making process that leads up to formally declaring an incident.
- Train the participants in designating emergency roles and responsibilities
- Train the participants to communicate fully, clearly, and unambiguously, not only within the technology team, but also with the user community at large.
Because many of us in IT are so used to dealing with end-user emergencies all day long, it often takes time to recognize that something bigger is going on and that a response must be escalated. As always, that turned out to be the case here too, but lessons were definitely learned and I am confident that we will do much better next time.
All-in-all, I think we had a good exercise and, once again, we are better prepared for when events really take place.