Still Managing False Positives?
As an Operations Analyst, there are few things more frustrating (aside from not having the resources you need to do your job), than being inundated with inaccurate information. When inaccurate information comes in the form of an alert generated by threat detection technology, we call it a false positive; a standard term used to describe the incorrect identification of something. In fact, the security industry has placed considerable emphasis on the subject of false positives, baking detection fidelity values into product marketing, and basing consulting efforts and operational best practices, on the objective of reducing false positives through technology configurations, rule fidelity measurements, alert “tuning,” and even SIEM rule writing. It is a noble cause, since in the information security world, accuracy of data is paramount.
Within the Security Operations context, I have found that the false positive management discussion usually breaks into two categories; the first being whether or not an individual product is accurately identifying the attribute it claims to (is x really x), and second whether or not the alert generated represents an actionable event that Analysts need to respond to (do I need to act upon x)? In the traditional security team, the Analyst or Incident Handler exists to manage incidents, which themselves represent events which have had a negative impact upon the enterprise in some measurable way. The point is to find bad things, to stop them, to restore things to their safe, working order, and to prevent them from happening again.
I fully subscribed to the urgency and practices of false positive management on individual detection technology (first scenario above) in my early years (circa 2000-2004). I approached detection (and prevention) technology from the perspective of configuring it to only tell me about the most accurate and urgent issues it could find, based on what was relevant to the characteristics of the network the technology was protecting. Before turning the IDS/IPS on, I would spend a considerable amount of time understanding the protected environment, then choosing only the IDS signatures and alerts that best matched the needs of that environment, and preventions for only the most accurate signatures and highest risk events. Even then, the technology by and large, would often inundate Analysts with false positives, and so we would often adjust the signature variables over time to include or exclude certain conditions. This was an effective way to manage the volume of alerts, but if we tuned the IDS too aggressively, we would find the opposite problem; no noise, but also, no detection of things that we did want to know about. We also tested tuning approaches based on the priority of signatures or threats. For example, turning off all informational alerts because they by themselves are rarely actionable anyway. I even recall in the early days, disabling entire network protocols from inspection because we didn’t believe we had them present in our environment; a practice that is clearly dangerous in today’s threat climate. Back then, it was all about managing that upside down triangle we depicted in presentations as the “alert funnel.”
The fundamental issue we were struggling with back then was the isolated perspective of the detection system. We needed to know when certain conditions or payloads were present on the network, however the presence of payload that resembles malicious activity a) isn’t always accurate and b) doesn’t tell us the full story of what actually happened. Even the signatures developed from intelligence that we had 100% confidence in, still produced operational false positives. Case in point, if a web browser is exposed to exploit code served from a website, and the IDS detects a string of characters we told it represents the exploit code, that’s a true positive detection in an isolated context. The session was indeed malicious and the IDS did detect what we told it to. However, what that doesn't tell an Analyst, is whether or not the exploit was successful, and whether or not the endpoint had been affected; namely compromised. There are many additional factors that may determine or prevent impact, and Analysts, generally speaking, only need to manage impactful events. This placed the burden on them to figure out which of those alerts did represent something they needed to act upon; a monotonous task. There is also the variable of network placement of the sensor relative to other mitigating technology. Thus, in an operations context, we found the need to manually pair detection alerts with other logs that would inform us of the outcome of the transaction, and impact observed (if any). As we moved in that direction, we found that it was more advantageous (from a single alert perspective) to shift our focus to and only elevate those alerts that described negative impact, such as C2 detections that were permitted to leave the network perimeter, which themselves have issues with false positives.
The fundamental problem and incorrect practice many operators (including us) made, is that none of these technologies or alerts can tell the full story, and so we should not attempt to derive high fidelity action according to individual alerts from any of them. Event the alerts that try to tell us about impact. Any individual log or alert represents a perspective on an individual portion of an overall event; a possibility, which in the threat context (think kill chain) requires multiple actions or events occurring in sequence over time, involving multiple network and system components to be realized. The only way to automate the optimal elevation of actionable alerts to Analysts is to be able to observe and validate cause and effect in an automated way. That is usually only discernable by looking toward the endpoint, although symptoms of endpoint actions are visible at the network layer (e.g. C2 as evidence of the effect caused by malware).
I’ll give you an example by using web exploit kits (WEK). In a WEK based attack, a user is usually lured to a particular website (often via a phishing email, redirection, malvertisement etc.). There are loads of tools or data feeds out there that try to tell you about this activity or infrastructure used for this purpose. When exposed to the initial page, the WEK will often perform asset identification and profiling so it can determine if it will serve additional content, and what that content or payload will be. This profiling activity is sometimes detectable using network inspection technology. It can even be detected simply based on the presence of URL attributes, and session behavior (based on the static structure of some WEKs). If the browsing asset (or application) matches predefined conditions, exploit code will be served (sometimes from a separate site) to the browsing application. This too can often be detected via network inspection. Next, the exploit code runs on the endpoint itself, within the browsing application, often with the intent of initiating the retrieval of additional content or a binary from a web property, followed by an attempt to automatically execute the retrieved object on the endpoint. At this point, you may have an endpoint application that has been exploited, but the endpoint itself has not been compromised (yet) meaning there is nothing for an Analyst to remediate (except the vulnerability in the application that facilitated the exploit). This too presents opportunities for detection via network inspection AND via endpoint monitoring. At this phase, we have exploit code running on the endpoint, we have a new network session retrieving additional content (think User-Agent strings, URI, request parameters etc.), we have a file download (subject to additional code inspection and sandboxing), and we have a binary waiting to be executed on the endpoint. Lost of places to detect badness. If the binary executes successfully, additional behaviors begin on the endpoint which can be visible via endpoint log and process monitoring, and we will likely have follow-up C2 traffic generated from the malware once installed. Lots of opportunities for detection, and even more opportunities to generate “false positives” in their traditional sense.
So, thinking of the WEK detection paradigm outlined above, where do you want to be notified as an Analyst? What do you want to know? That depends right? Do you want to know that endpoints are being exposed to web based exploits? Probably so. That’s good information to identify risk and root cause. Is that the right time to start action from an investigation and response perspective? Probably not. But what if you don’t have pre-defined content or the ability to detect the other activity that follows the initial exploit action? Should you be alerted at every stage possible just in case something gets missed? You might be inclined to say yes, but that’s where we land into the false positive trap. From the traditional Analyst perspective, the evidence they need to act upon is a sign of compromise, such as when the C2 is generated. That’s the only point at which you know an actionable event that requires remediation has occurred. The other isolated data points simply indicate that something may have happened. Now of course this is a generality that doesn’t always apply, but I think in most cases it does. However, knowing there has been a symptom that reflects possible impact, is not by itself always high fidelity and always actionable. At some point during the IR cycle, you’ll need to discern the cause...which to do, you need evidence collected at the time of the attack...which means you will need some alerts/data about those initial phases of the WEK process after all. However, having the data, doesn’t mean you have to present it all to Analysts.
That’s why thinking of alerts or logs by themselves and expecting some form of actionable fidelity from them doesn’t work. You have to view logs, alerts etc. as meta data. It’s simply an indicator of a possibility.
That’s where we as an industry turned to the world of event correlation and SIEM (circa 2003-2004). We needed to separate the raw sensor and log data from the Analyst perspective, and we needed to control the Analyst experience independent of native alerts. We found that through correlation, we could treat alerts as meta data actually suppress much of the noise coming from our deployed sensors (at least from the Analyst's perspective), and could elevate only the detections that we could correlate with additional supporting evidence. We could easily suppress alerts while keeping them active for evidence collection. We also found that through correlation, we could monitor for and piece together sequences of events observed from different technologies, and could base our alert decisions once a certain sequence or outcome was reached. It actually does work. We also found that we could fill some of the detection gaps, through further correlation and automated analytics. Effectively, we shifted from alert monitoring to behavior monitoring. Correlating disparate data together to discover a sequence of events, is in effect monitoring for behaviors. This made us hungry for more data, so we turned everything back on (at the log source) and filled our SIEMs with data, and our white boards with behavioral sequences, and our wikis with endless use cases. We applied certain math functions to the data in conjunction with boolean logic, and shazam, we were actually automating the Analyst analytics process, creating highly actionable alerts that truly represented something of interest.
For example, a team I once worked with used correlation to identify and dynamically tag at-risk assets based on threats they were exposed to, then used a separate sets of rules that evaluated those assets within the context of a separate behaviors and alerts. We used the correlation logic to classify data and raw alerts as different indications of behaviors we wanted to compare. In the above WEK example, we may tag an asset exposed to a web exploit kit as at risk of a malware infection. No alerts, just tagging the asset using meta data in the SIEM (ArcSight) in the form of a new correlated event. We were actually producing meta data before it was an industry buzz word. Then, we would monitor those assets that were tagged using a different set of correlation rules that were looking for suspicious behaviors which we had previously defined as evidence of impact. For example, did a host that was tagged as at-risk, also download a file that met our suspicious downloads criteria, and did those hosts generate web activity that matched C2 behavior such as repeated HTTP GET requests to different domains where the post domain URL information was always the same (an example of one behavior we looked for)? Did an at risk host connect to a domain that had not yet been categorized by our URL filters within 1 hour of being exposed (another suspicious behavior)? Did a host that was observed as at-risk, also generate a connection to a known C2 domain within 2 hours of exposure (another behavior)? Logic like that. It actually worked, and worked great. We expanded our imaginations and started searching for net new threat activity that our sensors couldn’t natively identify because they didn’t have the perspective we had, nor the analysis power we used, but we could spot badness based on what behavior those logs represented when joined sequentially and when analyzed with automated functions.
One of my favorite sets of rules we created was based on analysis of over 100 malware samples we analyzed over a period of months. We took 100 unique samples, and from each of them, summarized their delivery and C2 behavioral characteristics. One characteristic that stood out to us was the fact that most samples generated egress web connections that were unusually small in size (smaller than say 100 bytes) and repetitive in some way. We created content we called the “Bytes Out Tracker” that selected certain HTTP activity from endpoints (based on the observations from the malware), counted the total bytes sent per session, and looked for repeats of the same behavior from the same endpoint within the past 1 hour. The result? We found malware infections that nothing else alerted us to, and we found new C2 notes that our intelligence communities were unaware of. Today, we call that behavioral analytics, and industry tells us we need something net new, like big data, machine learning, and AI, to perform that sort of real-time analysis. Maybe so, but we did it nearly a decade ago using the commodity SIEM that our industry loves to hate.
However, as we became better at detecting and thwarting threats, adversaries got better at hiding their activity. That’s when we found the need to expand our detection rule sets (in the native inspection / logging tools) to also identify attributes within network transactions that could indicate malicious intent. That’s different than finding malicious behavior. Cause and effect were becoming more difficult to discern, so we had to look for potential. This expansion of data collection by itself creates a whole new flood of “false positives.” Where before we wanted to know the answer to questions like, “was the host served a malicious file,” now we needed to know things like “did the endpoint download a file that contained signs of obfuscation - regardless of verdict of the code itself,” and “what was the TLD of the domain involved in the web transaction,” or “have any of our hosts ever connected to that domain before?” Questions that were related to looking for suspicious or unusual characteristics vs. overtly malicious functions and behaviors. Rules that indicated sessions contained obfuscation became of high interest to some Analysts, and were high fidelity in some cases. Rules that simply identified file types involved in egress uploads also became very interesting and relevant artifacts. So, we added more sensors and more data to the mix. Suddenly, simply knowing things like the application that generated a network connection became a critical piece of evidence that we analyzed in real-time. In a traditional SOC, if you fed that stream of data to an Analyst, they would go crazy. We moved well beyond that reality, and stopped thinking of individual logs or even sources of logs as actionable data; it was all simply data that we needed to consider.
Overtime, we got good at this practice as well, and could spot anomalies within our data, once again by using commodity SIEM (ArcSight), that critics in industry like to tell you SIEM can’t do. It can, and we did. We also started moving toward industry products, like Fortinet, McAfee NTR, NetWitness, and Palo Alto Networks, that could provide us as much information about network transactions as possible - we just wanted to know what was happening in general so that we could discern intent.
However, that approach also has limitations, is extremely labor intensive, and is extremely expensive. It has a significant dependency upon the SIEM content management team (rule writers) being able to define and code for any possible combination of factors that would represent a true threat sequence. It also depended upon an Analyst team that was comfortable with experimentation, and stopped looking at logs or alerts by themselves as key indicators. It also meant we needed to store and analyze all that data. Now, having said that, an argument could be made that if your team can’t pre-define an attack sequence, they won’t know how to identify it or manually discover it should it happen in the absence of correlation logic. We can only do what we know how to do. The SIEM correlation approach will substantially reduce false positives AND it will also substantially improve your ability to spot things you want to know about...but it doesn’t necessarily help you find the things you don’t know to look for. That’s where intelligence functions began to really grow, because they became essential elements to operational success and the accuracy of threat detection. However, we sort of lost the plot on that.
This is where industry went off the deep end, chasing the rabbit down the down the endless data analysis path; more data, bigger storage, faster systems, more analytics, artificial intelligence, machine learning, different visualization etc. etc. etc. That’s the wrong direction. This is also where boutique services like dark web monitoring, Internet crawling, and related capabilities were born; trying to find the net impact somewhere out there. If you see data for sale on an underground forum, then you know you had a real event, and can move to mitigate issues that will have the most measurable impact to the business. But again, that’s running down the wrong road. It sounds cool, and it is the home of technology innovation, but it doesn’t solve the problem. It’s an insatiable appetite.
Paradigm Shift
Now that I’ve made the case for SIEMs (not meaning to), I want us to abandon that thought process. My point in telling that story is that I hope you can see that it is a never ending battle of trying to make data as valuable as possible in an operational context that is designed to rapidly detect threats in order to rapidly mitigate them and manage impact (possibly prevent damage). Trying to elevate the threats you know about, and find what you don’t know as a means to manage risk isn’t effective. That’s the wrong way to think about threats these days (and it was wrong back then too). Stop thinking about finding the badness that’s lurking invisible in your network or just beyond your field of view. In the words of the rock band Switchfoot, “there’s a new way to be human, something we’ve never been.” You see, we didn’t solve the core issues with our data analytics approach, we’ve been simply seeking the best bandaids to slow the bleeding. You won’t win the detection game. You can’t. It’s reactive in nature and most threats most organizations will face are automated anyway. By the time your multi-million dollar infrastructure finds the highest fidelity thing for your team to respond to and manage, it’s likely that the damage has been done.
You see, it was at the top of my “detection” and analytics game that I was exposed to a different class of Analyst who took a completely different approach. I still remember the event clearly. The conversation wasn’t “how do you best detect x,” or “what products give you the most value,” but rather, “did you know adversary y has updated tool z with x capabilities to go after a, b, and c data sets?” I recall a presentation where an Analyst (by title) presented a timeline of adversary activity that revealed a clear pattern. They weren’t talking about the lower levels of the kill chain, except in passing. Instead they were talking about predicting when and how their adversary would engage them, and with what tools and processes, exploiting what vulnerabilities in which applications. They focused not on finding suspicious behaviors and effects within their environments, and instead focused on defeating the adversary’s ability to operate against their environment to begin with. They studied their adversary, learning their tools, their behaviors, their objectives, their triggers, their operational rhythms, etc. etc. They were students, experts of their adversary. As I watched them in practice, I realized how simple this whole thing really is. You see adversaries operate in the same space we do, using the same technology, with the same constraints, and across the same infrastructure. They have finite capabilities and opportunities. The more you understand how they operate and can defeat their capabilities and remove opportunities, the far less you have to analyze on the back-end to detect and hopefully mitigate their success. Why not try your best to stop or disrupt them as they are acting? Why wait for them to act, hoping that maybe you can catch them based on the trail of blood they left behind? Yeah, I’m talking about prevention, but more than that, I’m talking about actually managing your threat posture relative to your actual adversary. The beauty is, you probably already have the means to do that, and it will cost you far less than you think.
Let’s be honest for a second now. That means, you have to get very real, and very serious about vulnerability management in your environment. Yep, that’s boring and many thrill seeking, people pleasing, and attention grabbing CISOs will have an allergic reaction to that. That means you have to get very intrusive, restrictive, and disruptive with regard to what employees can do with the assets and access they have been given. That means you have to get into the business context of your company, the software development and product release lifecycle, the services establishment and publishing process, IT infrastructure and asset management, risk management etc. You have to take charge of your enterprise security, and you have to do it from the basis of knowing your adversary, knowing yourself, and using that to influence everything that an adversary could touch in your company.
So what does that mean for the SOC Analyst battling the volume of false positives coming from their IDS sensor? It means they need to stop looking at the IDS, and start running reports to discern their risk status, and writing tickets that will help improve the security posture of the enterprise. What assets are externally facing that have exploitable vulnerabilities without mitigating controls in place? Those are your new high fidelity alerts that require action. What endpoints are running old software that needs to be upgraded? Ticket for remediation. What endpoints are outside of configuration compliance? Break-fix. What login portals are missing MFA? Break-fix. What user permissions are allowed on your endpoints? What users have access to applications they don't need? Schedule enterprise wide change control to remove unnecessary access. What network applications are your employees using that create risk? Time to update policy and submit firewall rule change requests to block access. Who is sending data out of the company and where? Time to remind employees of corporate policy, data privacy, and to block access to anything but authorized data and file storage locations. What software is installed on endpoints that could be used by an adversary? Time to remove it. Which firewalls have policies that are not compliant with security standards (i.e. missing detection rules, or allow more than they should)? Time for a configuration audit with remediation requirements. The list goes on and on, but these are the issues the SOC must begin to tackle. The SOC needs to be all about controlling risk in your environment; not passively managing the impact that results from the lack of proper risk management. No need for fancy analytics, no need for machine learning, no need for behavioral analysis of every single employee. No, the real need is to disarm your adversary by closing and locking all your windows and doors, and making yourself as difficult a fortress as possible to break into. Remember, they can't exploit a vulnerability on an application they cannot reach, or for an application that is not vulnerable.
Once you get a handle on that, then it’s time to ensure you have ongoing visibility into the status of those controls, to ensure you do not drift from your optimal posture. That means core success is no longer measured by metrics like by time to respond, but instead success can be defined by the low percentage of exposed vulnerabilities, or uncontrolled entry points, and unauthorized software.
You will also need to instrument capabilities to discover if adversaries are interacting with your environment, based on how you expect them to operate, but this will become a smaller and smaller scope as you mitigate their ability to operate against you in the first place. Why spend millions of dollars on infrastructure, facilities, and Analysts, looking for signs of lateral movement, when you can simply patch the vulnerabilities so the adversary cannot exploit them, control the delivery vectors so the adversary cannot enter, and lock down your endpoints so adversaries cannot use them?
Where do you start? Threat intelligence. I’m not talking about observables or threat indicators or 3rd party feeds (although those are components of an intelligence program), I’m talking about becoming a student of your adversary and truly understanding them. You also need to truly understand your business. Then, look to opportunities to disrupt your adversary; if you can’t stop them, make it as difficult as possible for them. To do that, you are going to need to become very close with your IT department. You may even need to merge your NOC and SOC together, because they will become increasingly dependent upon each other. The tools your NOC needs to manage assets, are the tools your SOC needs to measure risk. The actions your Network and IT staff take, and the ones that the SOC staff need to monitor to ensure they do not add or increase risk. It's time to get the family back together and on mission - the right mission.
False positives? Meh. I don’t think of them anymore.