Building Security Monitoring Program in the Age of Overwheling Data

This article had been originally featured in (IN)ECURE Magazine Issue 55: https://www.helpnetsecurity.com/insecuremag/issue-55-september-2017/.

The majority of security analysts I know have a job that’s made unnecessarily more difficult than it has to be. Everyday they’re charged with finding the veritable needle in a haystack with tools – SIEMs and log management systems – that have struggled with the latest technology trends, such as big data and cloud services. As a result, analysts are wasting time with high volumes of low-value data, and they’re missing valuable clues.

It’s time to revisit our approach to information security monitoring. In an attempt to bring some sanity back to our industry, we must take a step back and consider what exactly we need to achieve when it comes to information security monitoring and response.

With security information management solutions fulfilling an important but limited need, organizations have invested in tools that focus on specific problems. This has led to a proliferation of point solutions both within security organizations and in the market at large. But information security monitoring isn’t about tools – it’s about capabilities.

Once we understand the capabilities we need, then we can consider the most effective ways of addressing them as part of a capability-driven architecture. The features and functionality should transcend the technology, instead focusing on enabling your preparedness to deal with the unknown and closing the gaps in your security coverage as efficiently as possible.

A word of warning before we dive into the de-tails: trying to shortcut through these phases is a recipe for pain. For example, back in the day, intrusion prevention systems (IPS) could block a legitimate, business critical application without proper analysis because it behaved in a way the IPS didn’t expect. Similarly, automation tools today may launch remediation jobs based on a false positive alert generated by a SIEM or other security tool.

Even assuming your alerts are 100% accurate (I know, just go with me here), trusting remediation efforts to an automated system without first determining a complete kill chain is certain to put members of your team in the hot seat.

That said, let’s look at the information security monitoring framework and the capabilities needed in each phase.

Detection

Objective: Identify activity that may be indicative of malicious intent and/or has bypassed your preventive controls.

There are a ton of threat detection tools on the market, from inline network malware detection to end-point protection. You should pick the ones you like the most, with a couple of caveats.

Beware of kits that claim to boil the ocean. A detection tool is only as accurate as the environment in which it operates. This is very important. While an endpoint-based detection tool may have direct access to all the core aspects of a system, including file system and memory, it has no insight, for instance, into enterprise network transactions beyond the host. It would also do a poor job of under-standing activity at the application or data-base-transaction level. This is why detection is one area where I give point solutions a high regard. Remember to ensure that your detection coverage is thorough across the vertical application stack, and expert-built.

Detection should take place as close to the business-critical applications and data as possible. Many of these are more commonly SaaS-based these days and likely not monitored by your organization at the moment.

Some gains in detection capabilities may ex-tend to your Data Management solution. This is particularly relevant for monitoring hosted environments and applications that operate in areas outside of your control. Anomaly detection and machine learning tools may be helpful, but make sure the vendor’s claims match the needs of your particular environments and data. The same goes for threat intelligence data feeds and alerts.

Data Management

Objective: Consolidate as much information as possible about the environment in which threats and malicious activity have been dis-covered.

I could write volumes on what data should be collected for security monitoring. Outside of the usual suspects, like authentication, fire-wall, and proxy logs, think about where your critical data is stored, which applications manage that data, how access to that data is provisioned and controlled, the potential attack vectors against that data, and what type of information your incident response team needs in the event of a breach.

Ideally, activity data should be collected from every system, network, and application (the full stack) involved in managing your critical data assets. Monitoring prevention tools should be in scope as well. Your ability to understand the full context within which the preventive actions occur, as well as when pre-ventive controls fail, is paramount to improving your security posture.

In the past, a lot of high-volume, low-value log data wasn’t collected due to the performance impact and solution cost. Today, however, highly scalable, open source solutions make it possible to collect and analyze local workstation logs, database transaction logs, and application logs. In addition to these event data logs, you need contextual data, such as as-sets, application inventories, infrastructure configurations, etc. Script everything out or find a tool that manages the inventory well. Perform regular exports, store them in big data repositories, and correlate the data with other information in your logs.

The data you collect must be normalized and standardized. Normalization involves arranging semi-structured log data into uniform fields. The most typical candidates for field extraction are logs of certain Unix services. Fortunately, most solutions today are capable of producing log data in structured JSON or some other key/value format. The bigger concern is the standardization of data across multiple vendors and solutions. One way or an-other, the data must conform to the same standard.

The good news is, it doesn’t matter which standard you use, so long as you use one. Splunk’s “Common Information Model” (CIM) is a well-documented and viable option. There are also open standards that serve well as a reference.

The Open Data Model (ODM) from Apache Spot project is also at the top of my list. It has decent coverage of both event and contextual data structures, including contextual models for User, Endpoint, VPN, and Network. ODM provides a good foundation for open source-based security monitoring and analysis with all the benefits of big data scalability.

Analysis (Including Triage)

Objective: Provide security analysts with a robust environment to quickly identify false positives and conduct security incident investigations.

Alert triage requires contextual information to help reduce analysis fatigue and eliminate “zombie workflows.” With an undoubtedly high volume of alerts reported by various detection tools, analysts’ queues are overwhelming. Most teams only have the capacity to investigate 5-10% of daily alerts. The faster an analyst can identify a false positive, the sooner they can move on to something worthwhile.

Once analysts collect enough evidence to escalate an alert into an incident, the real work begins: reconstructing the full story of a com-promise from the initial ingress point, to every lateral step, every involved system, credential, and successful data point access. The biggest challenge in this phase is ensuring sufficient interactive performance for distributed data platforms.

Analysts have a hard-enough time digging through volumes of cryptic system and application events - the last thing they should be doing is performance tuning the NoSQL back-end. I can’t emphasize enough the value of expert help to get your analysis environment moving blazing fast.

Other capabilities needed for the analysis phase are collaboration and knowledge retention. Every organization has “that guy” who knows everything. We need to make sure that the knowledge gleaned from analysis doesn’t leave the organization when they leave. At the same time, findings should be shared with the team members.

Tips and tricks, knowledge of past incidents, indicators and attack vectors should be shared in a way that ties back to specific incidents and supplemental data to tell a complete story.

Finally, a robust security workflow framework is a must, but I’m sure you already know that.

Remediation/Response

Objective: Once the first real findings of a security incident begin trickling in, close security gaps quickly and thoroughly.

For years, incident response teams have used remediation playbooks. More recently, security orchestration has become a hot market trend. Leveraging automation remediation tools is key to closing the gaps, especially during an active incident. As important as it is for response to be rapid, it’s even more important that response is based on the results of a thorough investigation that completes the picture. Marking an incident as resolved while the hostile entity still has access to your network is not exactly ideal.

Conclusion

While SIEMs continue to have their place in information security monitoring environments, point solutions are proliferating almost as fast as data sources in the enterprise. But as much as our environments change due to big data and cloud services, following a capability-driven approach remains as important as ever. When you build your security monitoring program with a focus on fundamental capabilities first and technology second, your team will have everything it needs to identify, analyze, and remediate issues efficiently and effectively.