Cybersecurity professionals face a daunting task. While the threats are endless, the talented workforce to defend against them is scarce and budgets are tight. Many CISOs realize that they do not have good quality detection and response, they have to settle for the best they can get given the talent and budget constraints.
To determine what’s best — you have to be able to compare two providers. And, how do they do that before you sign on the dotted line? As a result, most small/medium-sized organizations end up buying MDR without doing a trial.
In an optimal scenario, all CISOs could have “really good” alert triage, investigation and response. But, the reality check is that setting an expectation of good must be done within the context of available budget and resource constraints, so the key to raising the bar of good within existing resources comes down to being able to consider the cost-effectiveness of solutions implemented.
So, what's the journey from alert to incident and where could the biggest improvements be made? Let’s look at each phase of the process:
It all starts with detection. Does the MDR vendor share what detections they have deployed? Do those detection use cases take what’s really important to the enterprise? Also - does the enterprise have data sources that will trigger those detections when the relevant attack happens?
The cyber threat landscape is constantly evolving — there can never be too many detections. The more detections you have the lower the risk of an unnoticed attack. However, an even bigger problem than not having good detections is that the output of detections teams already have in place either gets ignored or does not go through thorough investigation and triage. Less than 10% of alerts that an enterprise receives are real incidents. Many times that number is even lower. It's no wonder - small security teams simply chose to ignore those alerts. But the right solution is to determine which alerts pose a real risk. And, that requires investigation and triage.
Alerts aren’t incidents until proven guilty, especially given the high false positive rates. To pinpoint a real threat, first you need to determine if a threat is a false alarm, a warning, or a real incident. Investigation to bring together more context (aka enrichment of the alert) is the most crucial step in determining if something requires additional action and requires someone to sift through a stream of alerts and gather more information to decide if an incident should be created and what the level of urgency should be in responding to that incident.
Once an alert is investigated and confirmed as a real incident, the set of actions and next steps that should be taken is part of the Incident Response process. Some responses may have preset playbooks to help you work through them, while other incidents may require new action steps be taken to suit the situation.
Each step of the phase can require a significant amount of resources to complete thoroughly.
It’s cliche - faster, better, cheaper - pick two. I would suggest a slight variation - peg the cost - and ask for better and faster. It is also a more practical approach, as in most enterprises, budgets for cybersecurity are finite and hard to change.
So, how do we measure risk reduction? It's a tough nut to crack. However, we might have a proxy: ask the Chief Information Security Officer (CISO) this question:
“How much would you be willing to spend to make sure that every alert is thoroughly investigated and responded to in a timely manner?”
This establishes an implied value for “risk reduction” and the value they attach to ensuring timely investigation and response for every alert may not be perfect, but a solid starting point to establishing a better benchmark. Plus, you don’t need to let perfect be the enemy of good.
Mean time to detect (MTTD), Mean time to investigate (MTTI) and Mean Time to Respond (MTTR) are pretty common metrics. Unfortunately - 80%+ of security teams in the 100-1000 employee range do not have these metrics. They do not use processes and systems that can produce these metrics in seconds.
This should be a key deliverable for every MDR.
Comparing two SOCs on quality of detection and response, while not impossible, is not an exact science. It is much harder to distinguish between two detection and response providers is hard if the differences are minor, but if the differences are major - it is not hard to tell. Here are a few questions one can use to assess the quality of Detection and Response providers:
One thing that is measurable in SOC is alert fidelity. Alert fidelity is the %age of alerts that human analysts have to respond to that are real threats, and how often do you find threats that were not detected early enough or detected by chance? The first factor is “false positives”, the second case is the case of “false negatives”. You can reduce your false negative rate by escalating every alert as an incident. That is not a good idea as it comes with a hefty cost — the human time required for in-depth analysis and response.
Similarly, you can turn off every alert or ignore every alert, and the false positive rate drops to zero. But now you are incurring a significant risk that you will miss an alert that becomes an incident that turns into a breach. SOC teams have to constantly balance these two metrics given the limited time and resources. The number of false positives and false negatives processed establishes a clear quality metric for benchmarking alert triage quality.
You can even combine both false positives and false negatives into a single metric using the following:
This helps you get to a unified single metric for the value of quality of alert triage.
We hope this article gives you some ideas as to how you can measure the quality of detection and response.
Remember, what gets measured, typically gets improved. Conversely, how do you know if it is getting better if you can't measure it?