In order to prevent duplication of work and maximize the value provided by the Enterprise Architecture and Information Security discipline, it is necessary to find ways to communicate and take advantage from each other’s work. We have been examining the relationship between O-ISM3 and TOGAF®, both Open Group standards, and have found that, terminology differences aside, there are quite a number of ways to use these two standards together. We’d like to share our findings with The Open Group’s audience of Enterprise Architects, IT professionals, and Security Architects in this article.
Can you think of performing a Forensic analysis on a system with no records, no logs? Neither can I. Logs contain events like startup, restart, abnormal termination of services, physical and logical thresholds being exceeded, access to resources, network connections, privilege and access rights changes, configuration changes, etc. Logs are generated everywhere, in a multiplicity of formats, with different transports, API and formats. There are quite a few standards, for example:
It would be interesting to be able to check if all the important event are considered as part of the requirements of the design of an application. This can prevent nasty surprises when analyzing an incident. Using a good model of an information system can make this task relatively easy. Such a model would model an information system using the following elements:
Repositories (Credentials): Any temporary or permanent storage of information, including RAM, databases, file systems, and any kind of portable media;
Interfaces: Any input/output device, such as screens, printers and fax;
Channels: Physical or logical pathways for the flow of messages, including buses, LAN networks, etc. A Network is a dynamic set of channels;
Borders define the limits of the system.
Services. Any value provider in an information system, including services provided by BIOS, operating systems and applications. A service can collaborate with other services or lower level services to complete a task that provides value, like accessing information from a repository;
Sessions. A temporary relationship of trust between services. The establishment of this relationship can require the exchange of Credentials.
Messages (Instructions) . Any meaningful information exchanged between two services or a user and an interface.
For a log entry to be complete it should contain at least the following elements:
Every event can have an eventID.
If the event is not logged by the agent of the event, the "logger" can be identified using a "loggerID".
The "agent" of the event can be identified using a "sourceID".
The "agent" of the event can stay in different locations, identified using a "addressID".
The "credential" used by the source to perform a request can be identified using a "credentialID".
The "resource" (subject) of the event is identified using a "resourceID".
The "request" (access attempt) performed has a "RequestType" and a "Result". The reason for the "Result" is stated in the "ResultText".
The "payload" contains the information necessary to perform the request.
"dateTime" is the date and time when the request is performed.
"signature" is the digital signature of the event using the "credentialID".
"hash" is the digital summary of the event. It is recommended that the hash of the previous event in the Record is used to calculate it.
<sourceID>proftpd.lab.ossec.net</sourceID><addressID<22.214.171.124</addressID><credentialID>abad</credentialID><loggerID> proftpd.lab.ossec.net:21:slacker proftpd</loggerID><RequestType>login</RequestType><Result>failure</Result><ResultText>no such user found</ResultText><dateTime>21/5/2007 20:21:21</dateTime>
Using this scheme, it is possible to check how complete is a log, by checking:
If events need an unique identifier, or even a digital signature or a hash.
If there is a need to distinguish the process performing the action from the process logging the event.
if there is a need to identify the origin (agent) of the action.
If there is a need to identify the logical or physical location of the origin (agent) of the action.
If there is a need to identify the credentials used by the origin (agent) of the action.
If there is a need to identify the resource that is being accessed by the the origin (agent) of the action.
If there is a need to identify nature of the action (RequestType) performed on the resource.
If there is a need to identify the result of of the action performed on the resource.
If there is a need to identify the date and time of the action performed on the resource.
You can find a list of types of request and results here:
Using this list of type of request can very useful, as the RequestType indicates what is the type of resource being accessed, making it easier to read the log.
How do you check if your log designs are complete and contain all the information you might ever need?
A metric is a quantitative measurement that can be interpreted in the context of a series of previous or equivalent measurements. Metrics are necessary to show how security activity contributes directly to security goals; measure how changes in a process contribute to security goals; detect significant anomalies in processes and inform decisions to fix or improve processes. Good management metrics are said to be S.M.A.R.T:
Specific: The metric is relevant to the process being measured.
Measurable: Metric measurement is feasible with reasonable cost.
Relevant: Improvements in the metric meaningfully enhances the contribution of the process towards the goals of the management system.
Timely: The metric measurement is fast enough for being used effectively.
Metrics are fully defined by the following items:
Name of the metric;
Description of what is measured;
How is the metric measured;
How often is the measurement taken;
How are the thresholds calculated;
Range of values considered normal for the metric;
Best possible value of the metric;
Units of measurement.
Security Metrics are difficult to come by
Unfortunately, it is not easy to find metrics for security goals like security, trust and confidence. The main reason is that security goals are “negative deliverables”. The absence of incidents for an extended period of time leads to think that we are safe. If you live in a town where neither you nor anyone you know has ever been robbed, you feel safe. Incidents prevented can’t be measured in the same way a positive deliverable can, like the temperature of a room.
Metrics for goals are not just difficult to find; they are not very useful for security management. The reason for this is the indirect relationship between security activity and security goals. Intuitively most managers think that there is a direct link between what we do (which results or outputs) and what we want to achieve (the most important things: our goals). This belief is supported by real life experiences like making a sandwich. You buy the ingredients, go home, arrange them, and perhaps toast them and voilá: A warm sandwich ready to eat. The output sought (the sandwich) and the goal (eating a home made sandwich) match beautifully.
Unfortunately, there is no direct link every time. A good example can be research. There is not direct relationship between goals (discoveries) and the activity (experiments, publication). You can try hundreds of experiments and still not discover a cure for cancer. Same thing happens with security. The goals (trust, confidence, security) and the activity (controls, processes) are not directly linked.
When there is a direct link between activity and goal, like the temperature in a pot and the heat applied that pot, we know what decision to take if we want the temperature to drop: stop applying heat But, how will we make a network safer, adding (more accurate filtering), or summarizing (less complexity) filtering rules? We don’t know. If a process produces dropped packets, more or less dropped packets won’t necessarily make the network more or less secure, just like a change in the firewall rules won’t necessarily make the network safer of otherwise.
The disconnect present in information security between goals and activity prevents goal metrics from being useful for management, as you can never tell if you are closer to your goals because of decisions recently taken on the security processes.
Goal metric examples:
Instances of secret information disclosed per year. What can you do to prevent people with legitimate access to disclose that information?
Use of system by unauthorized users per month. What can you do to prevent people from letting other users to use their accounts?
Customers reports of misuse of personal data to the Data Protection Agency. Even if you are compliant, what can you do to prevent a customer to fill a report?
Risk reduction per year of 10%. As risk depends on internal an external factors, what can you do to actually modify risk?
Prevent 99% of incidents. How do you know how many incidents didn’t happen?
Actually useful security metrics
If metrics for goals are difficult to get, and are not very useful; what is a security manager to do? Measuring process outputs can be the answer. Measuring outputs is not only possible but very useful, as outputs contribute directly or indirectly to achieve security, trust and confidence. Using output metrics you can:
Measure how changes in a process contribute to outputs;
Detect significant anomalies in processes;
Inform decisions to fix or improve the process.
There are seven basic types of process output metrics:
Activity: The number of outputs produced in a time period;
Scope: The proportion of the environment or system that is protected by the process. For example, AV could be installed in only 50% of user PCs;
Update: The time since the last update or refresh of process outputs.
Availability: The time since a process has performed as expected upon demand (uptime), the frequency and duration of interruptions, and the time interval between interruptions.
Efficiency / Return on security investment (ROSI): Ratio of losses averted to the cost of the investment in the process. This metric measures the success of a process in comparison to the resources used.
Efficacy / Benchmark: Ratio of outputs produced in comparison to the theoretical maximum. Measuring efficacy of a process implies the comparison against a baseline.
Load: Ratio of available resources in actual use, like CPU load, repositories capacity, bandwidth, licenses and overtime hours per employee.
Examples of use of these metrics:
Activity: Measuring the number of new user account created per week, a sudden drop could lead to detecting that the new administrator is lazy, or that users started sharing user accounts, so they are not requesting them any more.
Scope: In an organization with a big number of third party connections, measuring the number of connections with third parties protected by a firewall could lead to a management decision not to create more unprotected connections.
Update: Measuring the update level of the servers in a DMZ could lead to investigating the root cause if the level goes above certain level.
Availability: Measuring the availability of a customer service portal could lead to rethinking the High Availability Architecture used.
Efficiency / Return on security investment (ROSI): Measuring the cost per seat of the Single Sign On systems of two companies being merged could lead to choose one system over the other.
Efficacy / Benchmark: Measuring backup speed of two different backup systems could lead to choose one over the other.
Load: Measuring and projecting the minimum load of a firewall could lead to taking the decision to upgrade pre-emptively.
There is an important issue to tackle when using output metrics; what I call the Comfort Zone. When there are too many false positives, the metrics is quickly dismissed, as it is not possible to investigate every single warning. On the other hand, when the metric never triggers a warning, there is a feeling that the metric is not working or providing value. The Comfort Zone (not too many false positives, pseudo-periodic warnings) can be achieved using an old tool from Quality Management, the control chart. The are some rules used in Quality Management to tell a warning, a condition that should be investigated from a normal statistical variation (Western Electric, Donald J. Wheeler's, Nelson rules), but for security management the best practice is adjusting the multiple of the standard deviation that will define the range of normal values for the metric until we achieve the Comfort Zone, pseudo-periodic warnings without too many false positives.
Using Security Management Metrics
There are six steps in the use of metrics: measurement, representation, interpretation, investigation and diagnosis.
Measurement: The measurement of the current value of the metric is periodic and normally refers to a window, for example: “9:00pm Sunday reading of the number of viruses cleaned in the week since the last reading” Measurements from different sources and different periods need to be normalized before integration in a single metric.
Interpretation: The meaning of a measured value is evaluated comparing the value of a measurement with a threshold, a comparable measurement, or a target. Normal values (those within thresholds) are estimated from historic or comparable data. The results of interpretation are:
Anomaly: When the measurement is beyond acceptable thresholds.
Success: When the measurement compares favourably with the target.
Trend: General direction of successive measurements relative to the target.
Benchmark: Relative position of the measurement or the trend with peers.
Incidents or poor performance take process metrics outside normal thresholds. Shewhart-Deming control charts are useful to indicate if the metric value is within the normal range, as values within the arithmetic mean plus/minus twice the standard deviation make more than 95.4% of the values of a normally distributed population. Fluctuations within the “normal” range would not normally be investigated.
Investigation: The investigation of abnormal measurements ideally ends with identification of the common cause, for example changes in the environment or results of management decisions, or a special cause (error, attack, accident) for the current value of the metric.
Representation: Proper visualization of the metric is key for reliable interpretation. Metrics representation will vary depending on the type of comparison and distribution of a resource. Bar charts, pie charts and line charts are most commonly used. Colours may help to highlight the meaning of a metric, such as the green-amber-red (equivalent to on-track, at risk and alert) traffic-light scale. Units, the period represented, and the period used to calculate the thresholds must always be given for the metric to be clearly understood. Rolling averages may be used to help identify trends.
Diagnosis: Managers should use the results of the previous steps to diagnose the situation, analyse alternatives and their consequences and make business decisions.
Fault in Plan-Do-Check-Act cycle leading to repetitive failures in a process -> Fix the process.
Weakness resulting from lack of transparency, partitioning, supervision, rotation or separation of responsibilities (TPSRSR) -> Fix the assignment of responsibilities .
Technology failure to perform as expected -> Change / adapt technology.
Inadequate resources -> Increase resources or adjust security targets.
Security target too high -> Revise the security target if the effect on the business would be acceptable.
Incompetence, dereliction of duty -> Take disciplinary action.
Inadequate training -> Institute immediate and/or long-term training of personnel.
Change in the environment -> Make improvements to adapt the process to the new conditions.
Previous management decision -> Check if the results of the decision were sought or unintended.
Error -> Fix the cause of the error.
Attack -> Evaluate whether the protection against the attack can be improved.
Accident -> Evaluate whether the protection against the accident can be improved.
What management practices become possible?
A side effect of an Information Security Management System (ISMS) lacking useful security metrics is that security management becomes centered in activities like Risk Assessment and Audit. Risk Assessment considers assets, threats, vulnerabilities and impacts to get a picture of security and prioritize design and improvements while Audit checks the compliance of the actual information security management system with the documented management system with an externally defined management system or an external regulation. Risk Assessment and Audit are valuable, but there are more useful security management activities like monitor, test, design & improvement and optimization that become possible with output metrics. Theses activities can be described as follows:
Monitor—Use metrics to watch processes outputs, detect abnormal conditions and assess the effect of changes in the process.
Test—Check if inputs to the process produce the expected outputs.
Improving - Making changes in the process to make it more suitable for the purpose, or to reduce usage of resources.
Planning - Organizing and forecasting the amount, assignment and milestones of tasks, resources, budget, deliverables and performance of a process.
Assessment - How well the process matches the organization's needs and compliance goals expressed as security objectives. How changes in the environment or management decisions in a process change the quality, performance and use of resources of the process; Whether bottlenecks or single points of failure exist; Points of diminishing returns; Benchmarking of processes between process instances and other organizations. Trends in quality, performance and efficiency.
Benefits realisation. Shows how achieving security objectives contributes to achieving business objectives, measures the value of the process for the organization, or justifies the use of resources.
While audits can be performed without metrics, monitoring, testing, planning, improvement and benefits realisation are not feasible without them.
What needs to be done?
S.M.A.R.T security managers need metrics that actually help them performing management activities.
While it is not necessary to drop goal metrics altogether, the day to day focus of information security management should be on security monitoring, testing, design & improvement and optimization using output metrics, which are the ones which will show what are the effect of management decisions, if things are getting worse or better, if processes work as designed, and if there are changes out of our direct control that cause abnormal conditions in security processes. All these activities are perfectly feasible using outputs metrics and control charts.