Incident Response 101
A few weeks ago, we had a minor emergency: a water supply line burst in a wall and decided to flood the floor of the IT department at a rather impressive rate. Being located in the basement, the water really had no where to go, and it started pooling rather quickly. Fortunately, the burst pipe was a fresh water supply line, rather than a waste disposal line.
A gut response of most people working in a service job is that they feel the need to actively help out in a situation where help is needed. In any form of emergency scenario, as is the case with computer security incident response, there are a few things to remember. I'll list them here again, in hope that they are useful to someone.
1) Slow down. Initial reports from others, as well as your own initial assessment, is most likely incorrect and incomplete. Count to ten, take a deep breath, and re-assess the situation.
2) Verify that there actually is an incident. If you get reports that something is going on, always verify them to the extent reasonably possible. In many cases, you'll find that reports are well-intended, but often wrong. However, always thank people for reporting and encourage them to keep doing it. You will never want to shut down folks; it is better to get 100 reports that were unfounded than miss the one that isn't.
3) Put someone in charge. Somebody needs to be put in charge of a scene. That person tells others what to do. Anyone who is not in charge should NOT initiate response actions on their own. This is the hardest one of all. Most technical folks are type A personalities who feel the need to be in control. Yielding that control to somebody else is hard, but doing so ensures that nobody is put in harms way, that no unnecessary effort is made, and that all necessary steps are taken. The National Incident Management System does a pretty good job at describing a structure to handle emergencies.
4) Secure the area. Whether you are dealing with a physical emergency or with a cyber emergency, securing the affected area is a necessary prerequisite for containment. Securing the area includes sending people who are not directly involved in response on their way, making sure that all persons are physically safe (and stay that way), and protecting property.
5) Contain the badness. Stop the situation from getting worse. In this example, it is as simple as shutting off the water supply. In other systems, it may be transferring live traffic to a secondary server, shutting down a system, or null-routing certain IP space.
6) Eradicate. Make the problem go away. In our flood scenario, we had a plumbing crew come in to replace a cap that had let go. In a server compromise, it may require a full system rebuild and data restoration from known-good medium, or a thorough malware removal exercise. Your mileage may vary.
7) Restore. Go back to a normal situation. In our case, the pipes were repaired, carpets dried out, sheetrock replaced and walls repainted. Always continue to watch for continued signs of trouble: as good as a job you may have done to eradicate the problem, it is easy enough to miss something small, or to accidentally not address the root cause.
8) Learn. One things are humming along nicely, go back and find out what you can do to make things better for the future. Looking back to place blame is unproductive.
Each of these steps has a distinct set of tactics associated with them. For example, when receiving a situation of a potentially dangerous situation, a tactic of keeping a distance to assess further risk and damage is probably wise. It is easy enough to slip in water. When containing a situation, messing around with electrical equipment in the middle of a flood doesn't make things better. In order to restore a backup from known-good, you a) need to have a backup, b) be able to read it, c) known when badness started and d) have archives that go back far enough.
Having an incident response strategy, and people trained in executing that strategy, is paramount.
Remember that, often, sending people out of harms way a good initial response. Removing unnecessary people from the equation without making them feel undervalued reduces chaos and complexity. It also ensures that 'need-to-know' is maintained. However, sending people out of harms way also requires established tactics: making sure that supervisors account for their reports, as well as for their areas guest and contractors is a good plan.
All of this comes down to preparation: plan for the worst, validate the plan through exercises, and train people in the tactics.
A gut response of most people working in a service job is that they feel the need to actively help out in a situation where help is needed. In any form of emergency scenario, as is the case with computer security incident response, there are a few things to remember. I'll list them here again, in hope that they are useful to someone.
1) Slow down. Initial reports from others, as well as your own initial assessment, is most likely incorrect and incomplete. Count to ten, take a deep breath, and re-assess the situation.
2) Verify that there actually is an incident. If you get reports that something is going on, always verify them to the extent reasonably possible. In many cases, you'll find that reports are well-intended, but often wrong. However, always thank people for reporting and encourage them to keep doing it. You will never want to shut down folks; it is better to get 100 reports that were unfounded than miss the one that isn't.
3) Put someone in charge. Somebody needs to be put in charge of a scene. That person tells others what to do. Anyone who is not in charge should NOT initiate response actions on their own. This is the hardest one of all. Most technical folks are type A personalities who feel the need to be in control. Yielding that control to somebody else is hard, but doing so ensures that nobody is put in harms way, that no unnecessary effort is made, and that all necessary steps are taken. The National Incident Management System does a pretty good job at describing a structure to handle emergencies.
4) Secure the area. Whether you are dealing with a physical emergency or with a cyber emergency, securing the affected area is a necessary prerequisite for containment. Securing the area includes sending people who are not directly involved in response on their way, making sure that all persons are physically safe (and stay that way), and protecting property.
5) Contain the badness. Stop the situation from getting worse. In this example, it is as simple as shutting off the water supply. In other systems, it may be transferring live traffic to a secondary server, shutting down a system, or null-routing certain IP space.
6) Eradicate. Make the problem go away. In our flood scenario, we had a plumbing crew come in to replace a cap that had let go. In a server compromise, it may require a full system rebuild and data restoration from known-good medium, or a thorough malware removal exercise. Your mileage may vary.
7) Restore. Go back to a normal situation. In our case, the pipes were repaired, carpets dried out, sheetrock replaced and walls repainted. Always continue to watch for continued signs of trouble: as good as a job you may have done to eradicate the problem, it is easy enough to miss something small, or to accidentally not address the root cause.
8) Learn. One things are humming along nicely, go back and find out what you can do to make things better for the future. Looking back to place blame is unproductive.
Each of these steps has a distinct set of tactics associated with them. For example, when receiving a situation of a potentially dangerous situation, a tactic of keeping a distance to assess further risk and damage is probably wise. It is easy enough to slip in water. When containing a situation, messing around with electrical equipment in the middle of a flood doesn't make things better. In order to restore a backup from known-good, you a) need to have a backup, b) be able to read it, c) known when badness started and d) have archives that go back far enough.
Having an incident response strategy, and people trained in executing that strategy, is paramount.
Remember that, often, sending people out of harms way a good initial response. Removing unnecessary people from the equation without making them feel undervalued reduces chaos and complexity. It also ensures that 'need-to-know' is maintained. However, sending people out of harms way also requires established tactics: making sure that supervisors account for their reports, as well as for their areas guest and contractors is a good plan.
All of this comes down to preparation: plan for the worst, validate the plan through exercises, and train people in the tactics.