Untangling the Confusion of Incident Response Acronyms

Written by Anthony Morris | May 15, 2024 6:28:53 AM

OVERVIEW

Zig Ziglar once said “You cannot solve a problem until you acknowledge you have one and accept responsibility for solving it.” The cybersecurity industry has a problem that is not widely acknowledged but is quickly realized by most once conversations begin around security metrics- “What do YOU mean by all of those MTTx acronyms?”

CYBERSECURITY METRICS

In the realm of cybersecurity, metrics play a vital role in understanding and improving an organization's security posture. These metrics are often identified by MTTx where MTT stands for Mean Time To __________. The blank is filled by some stage in the incident response process. Along that path, there are several waypoints that can be measured:

INITIAL EVENT: What is the timestamp of the first suspicious/malicious indicator? This is arguably when the clock starts because it is the first observable time an attacker began their activity.

SAMPLE QUESTION: When is the first indicator that an attack could be occurring?

DETECTION: There is a window of time from when the initial event occurs and it is identified as suspicious or malicious. Common reasons for delay in this stage include networking latency, indexing latency, frequency of detection schedules, etc. In a worst case scenario, the event occurs and is NEVER identified as concerning.

SAMPLE QUESTION: How long did it take after we received the indicator before we realized it could be malicious?

RESPONSE: The next window of time after an event is detected is when investigation begins. In security operation centers (SOCs) staffed by human analysts, this is the measurement of time after an event is in the alert/incident queue until investigation begins. The most common reason for delay in this stage is a shortage of security analysts- the queue of events generally runs a little ahead of free capacity.

SAMPLE QUESTION: After we realized it might be malicious, how long before we began to investigate and triage the event?

INVESTIGATION: The time it takes to make an investigation and develop an escalation/containment/remediation decision is the next waypoint to measure. This is the part of the investigation process where additional evidence is gathered and case enrichment occurs. Sadly, it is one of the easiest targets to hit when trying to shorten response time- simply investigate less and your response process will be faster. The secret key here is to shorten this stage without compromising the investigative quality.

SAMPLE QUESTION: How long are security analysts spending to investigate a case? Five minutes? Thirty minutes?

CONTAINMENT: After a decision has been made that containment and remediation is required, the next step is to prevent further spread or damage. The delays in this stage often become access and authority to implement the changes to control the threat. Due diligence should be exercised that the effort to control the threat does not create more adverse business impact than the threat itself.

SAMPLE QUESTION: How long does it take to contain the threat so additional future damage does not occur?

REMEDIATION: After a security threat is contained, work still needs to be done to restore business operations and prevent recurring events. While no new damage will occur because the threat is contained, it may be necessary to restore files, reset accounts, apply security patches and more. Most often this time is not captured in the security response tickets themselves and the responsibility is given to a different team as part of a post-mortem after the case is closed. That is an alternative and acceptable business process, but the way marker is included here because it should not be neglected.

SAMPLE QUESTION: How long did we take to restore business as usual?

CLOSE: This is the opportunity to perform any final details and cleanup for the event. Even in the case of false positives, this stage is beneficial to perform a second pair of eyes review to ensure processes were followed and subjective judgments were reasonable. It is good to ensure signatures have been tuned, systems patched, processes updated and indicators whitelisted as required.

SAMPLE QUESTION: When have we completed all work items associated with this event?

These waypoints can have different but similar names to capture the same intent- there isn’t an industry standard for these measurements. Therein lies the problem. As companies and vendors apply these stages, they apply acronyms to capture that waypoint.

ACRONYMS

Some of these acronyms have little ambiguity while others are HIGHLY ambiguous. Here are some examples for Mean Time To ___________ (Tip- the most common interpretation has a *.)

MTTD: Detect*
MTTA: Acknowledge*, Analyze
MTTF: Fix, Failure*
MTTC: Contain*, Close
MTTR: Respond*, Recovery, Resolve, Resolution, Review, Repair, Remediate

A security dashboard/report/marketing material might mention "MTTC" without clarifying whether it refers to Mean-Time-to-Contain or Mean-Time-to-Close. Another might reference MTTR without explaining if they are referring to Mean-Time-to-Respond or Mean-Time-to-Resolve. These time deltas can be significant and meaningful. Most certainly, the ambiguity can lead to discrepancies in understanding and assessing incident response performance.

This ambiguity then confusion, miscommunication and misinterpretation, hindering effective decision-making and resource allocation.

CHOOSING THE BEST METRIC

Which then is the best metric when it comes to tracking and improving incident management? The answer is all of them. Don’t remove ambiguity by removing substance. Remove ambiguity by having a better plan. To build a game plan to mitigate confusion and ensure accurate interpretation of security incident metrics, organizations should:

Establish Clear Definitions Internally - Define each metric clearly within the context of the organization's incident response processes, specifying whether acronyms like MTTC represent containment or closure.

Ask Vendors For The Same Clarity - After developing an internal understanding of these terms, companies need to ensure vendors are either using the same language (best case) or at least a translation is made to prevent the miscommunication between companies.

Provide Context - Accompany metrics with contextual information, such as definitions and descriptions, to ensure stakeholders understand their meanings and implications accurately. Include an accompanying description.

Be Transparent - When using acronyms, provide definitions of the terms- or even better provide an explanation of how the calculation is derived.

Standardize Reporting - Adopt standardized reporting practices across the organization to ensure consistency and clarity in how metrics are communicated and interpreted. The same acronyms should be used across all reports. It is a mistake to have an acronym mean one thing in one report and mean something else in a different report.

Regular Review and Communication - Regularly review and communicate incident response metrics with stakeholders, fostering a shared understanding and alignment on performance objectives.

MANAGING METRICS FOR PERFORMANCE

Once you begin to understand and standardize the incident response metrics in your organization, you can begin measuring delays in your incident response process.

Once you are able to measure and know your timelines, you can begin to implement the process and tooling improvements to responding and controlling security incidents faster.

Once you implement improvements to detect and respond to security incidents faster, you begin to help the business.

To discover more about how AirMDR uses metrics to manage for performance please contact us here.

View full post