Process Management with Security Metrics
A metric is a quantitative measurement that can be interpreted in the context of a series of previous or equivalent measurements. Metrics are necessary to show how security activity contributes directly to security goals; measure how changes in a process contribute to security goals; detect significant anomalies in processes and inform decisions to fix or improve processes. Good management metrics are said to be S.M.A.R.T:
- Specific: The metric is relevant to the process being measured.
- Measurable: Metric measurement is feasible with reasonable cost.
- Actionable: It is possible to act on the process to improve the metric.
- Relevant: Improvements in the metric meaningfully enhances the contribution of the process towards the goals of the management system.
- Timely: The metric measurement is fast enough for being used effectively.
Metrics are fully defined by the following items:
- Name of the metric;
- Description of what is measured;
- How is the metric measured;
- How often is the measurement taken;
- How are the thresholds calculated;
- Range of values considered normal for the metric;
- Best possible value of the metric;
- Units of measurement.
Security Metrics are difficult to come by
Unfortunately, it is not easy to find metrics for security goals like security, trust and confidence. The main reason is that security goals are “negative deliverables”. The absence of incidents for an extended period of time leads to think that we are safe. If you live in a town where neither you nor anyone you know has ever been robbed, you feel safe. Incidents prevented can’t be measured in the same way a positive deliverable can, like the temperature of a room.
Metrics for goals are not just difficult to find; they are not very useful for security management. The reason for this is the indirect relationship between security activity and security goals. Intuitively most managers think that there is a direct link between what we do (which results or outputs) and what we want to achieve (the most important things: our goals). This belief is supported by real life experiences like making a sandwich. You buy the ingredients, go home, arrange them, and perhaps toast them and voilá: A warm sandwich ready to eat. The output sought (the sandwich) and the goal (eating a home made sandwich) match beautifully.
Unfortunately, there is no direct link every time. A good example can be research. There is not direct relationship between goals (discoveries) and the activity (experiments, publication). You can try hundreds of experiments and still not discover a cure for cancer. Same thing happens with security. The goals (trust, confidence, security) and the activity (controls, processes) are not directly linked.
When there is a direct link between activity and goal, like the temperature in a pot and the heat applied that pot, we know what decision to take if we want the temperature to drop: stop applying heat But, how will we make a network safer, adding (more accurate filtering), or summarising (less complexity) filtering rules? We don’t know. If a process produces dropped packets, more or less dropped packets won’t necessarily make the network more or less secure, just like a change in the firewall rules won’t necessarily make the network safer of otherwise.
The disconnect present in information security between goals and activity prevents goal metrics from being useful for management, as you can never tell if you are closer to your goals because of decisions recently taken on the security processes.
Goal metric examples:
- Instances of secret information disclosed per year. What can you do to prevent people with legitimate access to disclose that information?
- Use of system by unauthorised users per month. What can you do to prevent people from letting other users to use their accounts?
- Customers reports of misuse of personal data to the Data Protection Agency. Even if you are compliant, what can you do to prevent a customer to fill a report?
- Risk reduction per year of 10%. As risk depends on internal an external factors, what can you do to actually modify risk?
- Prevent 99% of incidents. How do you know how many incidents didn’t happen?
Actually useful security metrics
If metrics for goals are difficult to get, and are not very useful; what is a security manager to do? Measuring process outputs can be the answer. Measuring outputs is not only possible but very useful, as outputs contribute directly or indirectly to achieve security, trust and confidence. Using output metrics you can:
- Measure how changes in a process contribute to outputs;
- Detect significant anomalies in processes;
- Inform decisions to fix or improve the process.
There are seven basic types of process output metrics:
- Activity: The number of outputs produced in a time period;
- Scope: The proportion of the environment or system that is protected by the process. For example, AV could be installed in only 50% of user PCs;
- Update: The time since the last update or refresh of process outputs.
- Availability: The time since a process has performed as expected upon demand (up time), the frequency and duration of interruptions, and the time interval between interruptions.
- Efficiency / Return on security investment (ROSI): Ratio of losses averted to the cost of the investment in the process. This metric measures the success of a process in comparison to the resources used.
- Efficacy / Benchmark: Ratio of outputs produced in comparison to the theoretical maximum. Measuring efficacy of a process implies the comparison against a baseline.
- Load: Ratio of available resources in actual use, like CPU load, repositories capacity, bandwidth, licenses and overtime hours per employee.
Examples of use of these metrics:
- Activity: Measuring the number of new user account created per week, a sudden drop could lead to detecting that the new administrator is lazy, or that users started sharing user accounts, so they are not requesting them any more.
- Scope: In an organization with a big number of third party connections, measuring the number of connections with third parties protected by a firewall could lead to a management decision not to create more unprotected connections.
- Update: Measuring the update level of the servers in a DMZ could lead to investigating the root cause if the level goes above certain level.
- Availability: Measuring the availability of a customer service portal could lead to rethinking the High Availability Architecture used.
- Efficiency / Return on security investment (ROSI): Measuring the cost per seat of the Single Sign On systems of two companies being merged could lead to choose one system over the other.
- Efficacy / Benchmark: Measuring backup speed of two different backup systems could lead to choose one over the other.
- Load: Measuring and projecting the minimum load of a firewall could lead to taking the decision to upgrade pre-emptively.
There is an important issue to tackle when using output metrics; what I call the Comfort Zone. When there are too many false positives, the metrics is quickly dismissed, as it is not possible to investigate every single warning. On the other hand, when the metric never triggers a warning, there is a feeling that the metric is not working or providing value. The Comfort Zone (not too many false positives, pseudo-periodic warnings) can be achieved using an old tool from Quality Management, the control chart. The are some rules used in Quality Management to tell a warning, a condition that should be investigated from a normal statistical variation (Western Electric, Donald J. Wheeler's, Nelson rules), but for security management the best practice is adjusting the multiple of the standard deviation that will define the range of normal values for the metric until we achieve the Comfort Zone, pseudo-periodic warnings without too many false positives.
Using Security Management Metrics
There are six steps in the use of metrics: measurement, representation, interpretation, investigation and diagnosis.
Measurement: The measurement of the current value of the metric is periodic and normally refers to a window, for example: “9:00pm Sunday reading of the number of viruses cleaned in the week since the last reading” Measurements from different sources and different periods need to be normalized before integration in a single metric.
Interpretation: The meaning of a measured value is evaluated comparing the value of a measurement with a threshold, a comparable measurement, or a target. Normal values (those within thresholds) are estimated from historic or comparable data. The results of interpretation are:
- Anomaly: When the measurement is beyond acceptable thresholds.
- Success: When the measurement compares favourably with the target.
- Trend: General direction of successive measurements relative to the target.
- Benchmark: Relative position of the measurement or the trend with peers.
Incidents or poor performance take process metrics outside normal thresholds. Shewhart-Deming control charts are useful to indicate if the metric value is within the normal range, as values within the arithmetic mean plus/minus twice the standard deviation make more than 95.4% of the values of a normally distributed population. Fluctuations within the “normal” range would not normally be investigated.
Investigation: The investigation of abnormal measurements ideally ends with identification of the common cause, for example changes in the environment or results of management decisions, or a special cause (error, attack, accident) for the current value of the metric.
Representation: Proper visualisation of the metric is key for reliable interpretation. Metrics representation will vary depending on the type of comparison and distribution of a resource. Bar charts, pie charts and line charts are most commonly used. Colours may help to highlight the meaning of a metric, such as the green-amber-red (equivalent to on-track, at risk and alert) traffic-light scale. Units, the period represented, and the period used to calculate the thresholds must always be given for the metric to be clearly understood. Rolling averages may be used to help identify trends.
Diagnosis: Managers should use the results of the previous steps to diagnose the situation, analyse alternatives and their consequences and make business decisions.
- Fault in Plan-Do-Check-Act cycle leading to repetitive failures in a process -> Fix the process.
- Weakness resulting from lack of transparency, partitioning, supervision, rotation or separation of responsibilities (TPSRSR) -> Fix the assignment of responsibilities .
- Technology failure to perform as expected -> Change / adapt technology.
- Inadequate resources -> Increase resources or adjust security targets.
- Security target too high -> Revise the security target if the effect on the business would be acceptable.
- Incompetence, dereliction of duty -> Take disciplinary action.
- Inadequate training -> Institute immediate and/or long-term training of personnel.
- Change in the environment -> Make improvements to adapt the process to the new conditions.
- Previous management decision -> Check if the results of the decision were sought or unintended.
- Error -> Fix the cause of the error.
- Attack -> Evaluate whether the protection against the attack can be improved.
- Accident -> Evaluate whether the protection against the accident can be improved.
What management practices become possible?
A side effect of an Information Security Management System (ISMS) lacking useful security metrics is that security management becomes centred in activities like Risk Assessment and Audit. Risk Assessment considers assets, threats, vulnerabilities and impacts to get a picture of security and prioritise design and improvements while Audit checks the compliance of the actual information security management system with the documented management system with an externally defined management system or an external regulation. Risk Assessment and Audit are valuable, but there are more useful security management activities like monitor, test, design & improvement and optimisation that become possible with output metrics. Theses activities can be described as follows:
- Monitor—Use metrics to watch processes outputs, detect abnormal conditions and assess the effect of changes in the process.
- Test—Check if inputs to the process produce the expected outputs.
- Improving - Making changes in the process to make it more suitable for the purpose, or to reduce usage of resources.
- Planning - Organising and forecasting the amount, assignment and milestones of tasks, resources, budget, deliverable and performance of a process.
- Assessment - How well the process matches the organisation's needs and compliance goals expressed as security objectives. How changes in the environment or management decisions in a process change the quality, performance and use of resources of the process; Whether bottlenecks or single points of failure exist; Points of diminishing returns; Benchmarking of processes between process instances and other organisations. Trends in quality, performance and efficiency.
- Benefits realisation. Shows how achieving security objectives contributes to achieving business objectives, measures the value of the process for the organisation, or justifies the use of resources.
While audits can be performed without metrics, monitoring, testing, planning, improvement and benefits realisation are not feasible without them.
What needs to be done?
S.M.A.R.T security managers need metrics that actually help them performing management activities.
While it is not necessary to drop goal metrics altogether, the day to day focus of information security management should be on security monitoring, testing, design & improvement and optimization using output metrics, which are the ones which will show what are the effect of management decisions, if things are getting worse or better, if processes work as designed, and if there are changes out of our direct control that cause abnormal conditions in security processes. All these activities are perfectly feasible using outputs metrics and control charts.