Security Operations Centers (SOCs) and Managed Detection and Response (MDR) providers exist to do one thing exceptionally well: alert triage, investigation, and response (ATIR). Whether this work is performed manually, through automation, or a combination of both, its effectiveness directly impacts an organization’s security posture.
Yet many teams operate on assumptions rather than data. They assume their processes work. They assume alerts are handled well. They assume automation is improving outcomes. In security, assumptions are dangerous. If you can’t measure ATIR performance, you can’t manage it—and you certainly can’t improve it.
Speed and cost are the default ATIR scorecards, but they don’t measure whether cases are correct, defensible, and actionable–and when quality checks are only ad hoc, teams end up managing by gut feel.
That’s why building a meaningful case quality metric is critical.
To determine whether ATIR is truly working, organizations need to evaluate it across three dimensions:
No metric will ever be perfect. But a “good enough” metric is far better than having none at all. Without measurement, teams lack visibility into performance gaps and cannot take corrective action when outcomes fall below acceptable thresholds.
Cost is the easiest place to start because it’s measurable and tangible.
A simple normalization approach is to calculate how much is being spent on ATIR per employee. For example:
These numbers provide a baseline for comparing delivery models and efficiency—but cost alone says nothing about effectiveness.
Speed is typically measured using familiar operational metrics such as:
Faster detections, investigations and responses are generally seen as positive indicators. However, speed without quality is misleading.
A team could theoretically automate ticket closures in seconds and produce outstanding speed metrics at minimal cost. But if investigations lack depth and accuracy, the outcomes are poor—and risk increases rather than decreases.
Measuring quality in security operations is inherently difficult.
To measure quality perfectly, you would need ground truth: definitive knowledge of whether every alert is benign or malicious. In reality, that certainty is rarely possible. Even experienced analysts often operate without 100% confidence.
So instead of pursuing a perfect quality metric, the goal should be to build a practical and reliable proxy.
One effective approach is to study how experienced SOC analysts evaluate investigations. Conversations with senior analysts, SOC managers, and security engineers reveal a consistent pattern:
High-quality investigations are driven by structured questioning and context building.
Before making decisions, skilled analysts seek answers to key questions related to the alert:
Fast decisions without sufficient context are risky. It’s the depth of understanding—not the speed of closure—that determines whether an alert can be safely ignored or requires escalation.
This insight leads to a powerful concept: turning expert investigation behavior into a measurable framework.
A quality scoring framework can:
With this approach, if critical questions are unanswered—or only partially addressed—the investigation quality is low, regardless of how quickly it was completed.
This transforms quality from a subjective judgment into something observable and repeatable.
Automation plays an essential role in scaling quality measurement. It can consistently apply scoring frameworks, identify gaps, and track trends over time.
However, automation alone isn’t enough.
Just as software teams augment automated testing with manual QA, SOC teams benefit from combining:
Human oversight ensures that automated metrics align with what truly matters to analysts and customers.
When alert triage, investigation, and response quality is measurable, organizations can finally connect operational performance to real outcomes.
Security Operations teams can:
Most importantly, they can avoid the trap of optimizing for the wrong things.
Because without quality metrics, speed and cost metrics are ultimately meaningless.
It is now possible to build practical, defensible metrics for the quality of alert triage, investigation, and response. By grounding evaluation in structured analyst behavior and combining automation with human judgment, SOC and MDR teams can measure what truly matters.
And once quality is measurable, it becomes improvable.
That’s when security operations move from reactive activity to disciplined, outcome-driven performance.
One of the most important functions SOCs and MDRs (aka SOC-as-a-Service) provide is alert triage, investigation & response (ATIR).
The process of ATIR can be done manually or in an automated fashion or some mix of both. Like most things, assuming something is working well is a recipe for surprises. You’ve got to be able to measure it.
Measuring something not only tells you whether it works or not — it triggers actions if the performance metric is under an acceptable threshold.
Let’s go back to ATIR, how can we say that ATIR is working? There are three key dimensions: Quality, Speed and Cost.
No metric is going to be perfect. But a good enough metric is far better than no metric.
Cost is the easiest metric. To normalize that number — ask how much $$ per employee are you spending on ATIR.
→ A 1000 person company spending $150K a year on MDR is spending ~$12+ per employee per month.
→ Doing that in-house can cost you $1M a year, bumping that cost to over $80 per person per month.
Two key metrics → MTTI & MTTR.
Measuring just cost & speed is meaningless unless you measure quality.
To exaggerate — write a script to close every ticket within seconds. MTTI & MTTR would look great, cost is very low. But the quality is horrible.
Measuring quality is hard.
In order to measure quality, you have to know the ground truth. In security, it might be impossible to say with 100% confidence whether an alert is benign or not — even 90% of the time.
If we cannot get a perfect quality metric, can we build a good enough quality metric? So we looked at how senior SOC analysts evaluate the quality of an investigation. After speaking with several SOC analysts, SOC managers and security engineers — a pattern started to emerge.
They all expected the investigation to write up a set of questions based on the type of the alert they were looking at. Building a rich, detailed context before making the decision is critical. Fast means nothing if the context is thin — it is the additional context that determines whether an alert can be safely ignored or if it requires an action.
And that is what the SOC grader is. It generates questions that every SOC analyst should ask in order to correctly triage an alert. That rubric evaluates the case to see whether those questions were answered during the investigation.
If key questions were not answered, or partially answered — then the case quality is deemed poor.
In development, it is a standard practice to rely on automated testing. But in practice, I have always found it useful and highly productive to augment automated tests with manual QA.
Automation can help us ensure quality at scale. Augmenting it with manual scoring is a good way to make sure automated testing is measuring what real customers care about.
To summarize, it is now possible to have a metric for quality of alert triage, investigation & response. Unquestionably, without quality metrics, speed and cost metrics are meaningless.