Friday, August 4, 2017

Still Managing False Positives?

Still Managing False Positives?

As an Operations Analyst, there are few things more frustrating (aside from not having the resources you need to do your job), than being inundated with inaccurate information. When inaccurate information comes in the form of an alert generated by threat detection technology, we call it a false positive; a standard term used to describe the incorrect identification of something. In fact, the security industry has placed considerable emphasis on the subject of false positives, baking detection fidelity values into product marketing, and basing consulting efforts and operational best practices, on the objective of reducing false positives through technology configurations, rule fidelity measurements, alert “tuning,” and even SIEM rule writing. It is a noble cause, since in the information security world, accuracy of data is paramount.

Within the Security Operations context, I have found that the false positive management discussion usually breaks into two categories; the first being whether or not an individual product is accurately identifying the attribute it claims to (is x really x), and second whether or not the alert generated represents an actionable event that Analysts need to respond to (do I need to act upon x)? In the traditional security team, the Analyst or Incident Handler exists to manage incidents, which themselves represent events which have had a negative impact upon the enterprise in some measurable way. The point is to find bad things, to stop them, to restore things to their safe, working order, and to prevent them from happening again.

I fully subscribed to the urgency and practices of false positive management on individual detection technology (first scenario above) in my early years (circa 2000-2004). I approached detection (and prevention) technology from the perspective of configuring it to only tell me about the most accurate and urgent issues it could find, based on what was relevant to the characteristics of the network the technology was protecting. Before turning the IDS/IPS on, I would spend a considerable amount of time understanding the protected environment, then choosing only the IDS signatures and alerts that best matched the needs of that environment, and preventions for only the most accurate signatures and highest risk events. Even then, the technology by and large, would often inundate Analysts with false positives, and so we would often adjust the signature variables over time to include or exclude certain conditions. This was an effective way to manage the volume of alerts, but if we tuned the IDS too aggressively, we would find the opposite problem; no noise, but also, no detection of things that we did want to know about. We also tested tuning approaches based on the priority of signatures or threats. For example, turning off all informational alerts because they by themselves are rarely actionable anyway. I even recall in the early days, disabling entire network protocols from inspection because we didn’t believe we had them present in our environment; a practice that is clearly dangerous in today’s threat climate. Back then, it was all about managing that upside down triangle we depicted in presentations as the “alert funnel.”

The fundamental issue we were struggling with back then was the isolated perspective of the detection system. We needed to know when certain conditions or payloads were present on the network, however the presence of payload that resembles malicious activity a) isn’t always accurate and b) doesn’t tell us the full story of what actually happened. Even the signatures developed from intelligence that we had 100% confidence in, still produced operational false positives. Case in point, if a web browser is exposed to exploit code served from a website, and the IDS detects a string of characters we told it represents the exploit code, that’s a true positive detection in an isolated context. The session was indeed malicious and the IDS did detect what we told it to. However, what that doesn't tell an Analyst, is whether or not the exploit was successful, and whether or not the endpoint had been affected; namely compromised. There are many additional factors that may determine or prevent impact, and Analysts, generally speaking, only need to manage impactful events. This placed the burden on them to figure out which of those alerts did represent something they needed to act upon; a monotonous task. There is also the variable of network placement of the sensor relative to other mitigating technology. Thus, in an operations context, we found the need to manually pair detection alerts with other logs that would inform us of the outcome of the transaction, and impact observed (if any). As we moved in that direction, we found that it was more advantageous (from a single alert perspective) to shift our focus to and only elevate those alerts that described negative impact, such as C2 detections that were permitted to leave the network perimeter, which themselves have issues with false positives.

The fundamental problem and incorrect practice many operators (including us) made, is that none of these technologies or alerts can tell the full story, and so we should not attempt to derive high fidelity action according to individual alerts from any of them. Event the alerts that try to tell us about impact. Any individual log or alert represents a perspective on an individual portion of an overall event; a possibility, which in the threat context (think kill chain) requires multiple actions or events occurring in sequence over time, involving multiple network and system components to be realized. The only way to automate the optimal elevation of actionable alerts to Analysts is to be able to observe and validate cause and effect in an automated way. That is usually only discernable by looking toward the endpoint, although symptoms of endpoint actions are visible at the network layer (e.g. C2 as evidence of the effect caused by malware).

I’ll give you an example by using web exploit kits (WEK). In a WEK based attack, a user is usually lured to a particular website (often via a phishing email, redirection, malvertisement etc.). There are loads of tools or data feeds out there that try to tell you about this activity or infrastructure used for this purpose. When exposed to the initial page, the WEK will often perform asset identification and profiling so it can determine if it will serve additional content, and what that content or payload will be. This profiling activity is sometimes detectable using network inspection technology. It can even be detected simply based on the presence of URL attributes, and session behavior (based on the static structure of some WEKs). If the browsing asset (or application) matches predefined conditions, exploit code will be served (sometimes from a separate site) to the browsing application. This too can often be detected via network inspection. Next, the exploit code runs on the endpoint itself, within the browsing application, often with the intent of initiating the retrieval of additional content or a binary from a web property, followed by an attempt to automatically execute the retrieved object on the endpoint. At this point, you may have an endpoint application that has been exploited, but the endpoint itself has not been compromised (yet) meaning there is nothing for an Analyst to remediate (except the vulnerability in the application that facilitated the exploit). This too presents opportunities for detection via network inspection AND via endpoint monitoring. At this phase, we have exploit code running on the endpoint, we have a new network session retrieving additional content (think User-Agent strings, URI, request parameters etc.), we have a file download (subject to additional code inspection and sandboxing), and we have a binary waiting to be executed on the endpoint. Lost of places to detect badness. If the binary executes successfully, additional behaviors begin on the endpoint which can be visible via endpoint log and process monitoring, and we will likely have follow-up C2 traffic generated from the malware once installed. Lots of opportunities for detection, and even more opportunities to generate “false positives” in their traditional sense.

So, thinking of the WEK detection paradigm outlined above, where do you want to be notified as an Analyst? What do you want to know? That depends right? Do you want to know that endpoints are being exposed to web based exploits? Probably so. That’s good information to identify risk and root cause. Is that the right time to start action from an investigation and response perspective? Probably not. But what if you don’t have pre-defined content or the ability to detect the other activity that follows the initial exploit action? Should you be alerted at every stage possible just in case something gets missed? You might be inclined to say yes, but that’s where we land into the false positive trap. From the traditional Analyst perspective, the evidence they need to act upon is a sign of compromise, such as when the C2 is generated. That’s the only point at which you know an actionable event that requires remediation has occurred. The other isolated data points simply indicate that something may have happened. Now of course this is a generality that doesn’t always apply, but I think in most cases it does. However, knowing there has been a symptom that reflects possible impact, is not by itself always high fidelity and always actionable. At some point during the IR cycle, you’ll need to discern the cause...which to do, you need evidence collected at the time of the attack...which means you will need some alerts/data about those initial phases of the WEK process after all. However, having the data, doesn’t mean you have to present it all to Analysts.

That’s why thinking of alerts or logs by themselves and expecting some form of actionable fidelity from them doesn’t work. You have to view logs, alerts etc. as meta data. It’s simply an indicator of a possibility.

That’s where we as an industry turned to the world of event correlation and SIEM (circa 2003-2004). We needed  to separate the raw sensor and log data from the Analyst perspective, and we needed to control the Analyst experience independent of native alerts. We found that through correlation, we could treat alerts as meta data actually suppress much of the noise coming from our deployed sensors (at least from the Analyst's perspective), and could elevate only the detections that we could correlate with additional supporting evidence. We could easily suppress alerts while keeping them active for evidence collection. We also found that through correlation, we could monitor for and piece together sequences of events observed from different technologies, and could base our alert decisions once a certain sequence or outcome was reached. It actually does work. We also found that we could fill some of the detection gaps, through further correlation and automated analytics. Effectively, we shifted from alert monitoring to behavior monitoring. Correlating disparate data together to discover a sequence of events, is in effect monitoring for behaviors. This made us hungry for more data, so we turned everything back on (at the log source) and filled our SIEMs with data, and our white boards with behavioral sequences, and our wikis with endless use cases. We applied certain math functions to the data in conjunction with boolean logic, and shazam, we were actually automating the Analyst analytics process, creating highly actionable alerts that truly represented something of interest.

For example, a team I once worked with used correlation to identify and dynamically tag at-risk assets based on threats they were exposed to, then used a separate sets of rules that evaluated those assets within the context of a separate behaviors and alerts. We used the correlation logic to classify data and raw alerts as different indications of behaviors we wanted to compare. In the above WEK example, we may tag an asset exposed to a web exploit kit as at risk of a malware infection. No alerts, just tagging the asset using meta data in the SIEM (ArcSight) in the form of a new correlated event. We were actually producing meta data before it was an industry buzz word. Then, we would monitor those assets that were tagged using a different set of correlation rules that were looking for suspicious behaviors which we had previously defined as evidence of impact. For example, did a host that was tagged as at-risk, also download a file that met our suspicious downloads criteria, and did those hosts generate web activity that matched C2 behavior such as repeated HTTP GET requests to different domains where the post domain URL information was always the same (an example of one behavior we looked for)? Did an at risk host connect to a domain that had not yet been categorized by our URL filters within 1 hour of being exposed (another suspicious behavior)? Did a host that was observed as at-risk, also generate a connection to a known C2 domain within 2 hours of exposure (another behavior)? Logic like that. It actually worked, and worked great. We expanded our imaginations and started searching for net new threat activity that our sensors couldn’t natively identify because they didn’t have the perspective we had, nor the analysis power we used, but we could spot badness based on what behavior those logs represented when joined sequentially and when analyzed with automated functions.

One of my favorite sets of rules we created was based on analysis of over 100 malware samples we analyzed over a period of months. We took 100 unique samples, and from each of them, summarized their delivery and C2 behavioral characteristics. One characteristic that stood out to us was the fact that most samples generated egress web connections that were unusually small in size (smaller than say 100 bytes) and repetitive in some way. We created content we called the “Bytes Out Tracker” that selected certain HTTP activity from endpoints (based on the observations from the malware), counted the total bytes sent per session, and looked for repeats of the same behavior from the same endpoint within the past 1 hour. The result? We found malware infections that nothing else alerted us to, and we found new C2 notes that our intelligence communities were unaware of. Today, we call that behavioral analytics, and industry tells us we need something net new, like big data, machine learning, and AI, to perform that sort of real-time analysis. Maybe so, but we did it nearly a decade ago using the commodity SIEM that our industry loves to hate.

However, as we became better at detecting and thwarting threats, adversaries got better at hiding their activity. That’s when we found the need to expand our detection rule sets (in the native inspection / logging tools) to also identify attributes within network transactions that could indicate malicious intent. That’s different than finding malicious behavior. Cause and effect were becoming more difficult to discern, so we had to look for potential. This expansion of data collection by itself creates a whole new flood of “false positives.” Where before we wanted to know the answer to questions like, “was the host served a malicious file,” now we needed to know things like “did the endpoint download a file that contained signs of obfuscation - regardless of verdict of the code itself,” and “what was the TLD of the domain involved in the web transaction,” or “have any of our hosts ever connected to that domain before?” Questions that were related to looking for suspicious or unusual characteristics vs. overtly malicious functions and behaviors. Rules that indicated sessions contained obfuscation became of high interest to some Analysts, and were high fidelity in some cases. Rules that simply identified file types involved in egress uploads also became very interesting and relevant artifacts. So, we added more sensors and more data to the mix. Suddenly, simply knowing things like the application that generated a network connection became a critical piece of evidence that we analyzed in real-time. In a traditional SOC, if you fed that stream of data to an Analyst, they would go crazy. We moved well beyond that reality, and stopped thinking of individual logs or even sources of logs as actionable data; it was all simply data that we needed to consider.

Overtime, we got good at this practice as well, and could spot anomalies within our data, once again by using commodity SIEM (ArcSight), that critics in industry like to tell you SIEM can’t do. It can, and we did. We also started moving toward industry products, like Fortinet, McAfee NTR, NetWitness, and Palo Alto Networks, that could provide us as much information about network transactions as possible - we just wanted to know what was happening in general so that we could discern intent.

However, that approach also has limitations, is extremely labor intensive, and is extremely expensive. It has a significant dependency upon the SIEM content management team (rule writers) being able to define and code for any possible combination of factors that would represent a true threat sequence. It also depended upon an Analyst team that was comfortable with experimentation, and stopped looking at logs or alerts by themselves as key indicators. It also meant we needed to store and analyze all that data. Now, having said that, an argument could be made that if your team can’t pre-define an attack sequence, they won’t know how to identify it or manually discover it should it happen in the absence of correlation logic. We can only do what we know how to do. The SIEM correlation approach will substantially reduce false positives AND it will also substantially improve your ability to spot things you want to know about...but it doesn’t necessarily help you find the things you don’t know to look for. That’s where intelligence functions began to really grow, because they became essential elements to operational success and the accuracy of threat detection. However, we sort of lost the plot on that.

This is where industry went off the deep end, chasing the rabbit down the down the endless data analysis path; more data, bigger storage, faster systems, more analytics, artificial intelligence, machine learning, different visualization etc. etc. etc. That’s the wrong direction. This is also where boutique services like dark web monitoring, Internet crawling, and related capabilities were born; trying to find the net impact somewhere out there. If you see data for sale on an underground forum, then you know you had a real event, and can move to mitigate issues that will have the most measurable impact to the business. But again, that’s running down the wrong road. It sounds cool, and it is the home of technology innovation, but it doesn’t solve the problem. It’s an insatiable appetite.

Paradigm Shift

Now that I’ve made the case for SIEMs (not meaning to), I want us to abandon that thought process. My point in telling that story is that I hope you can see that it is a never ending battle of trying to make data as valuable as possible in an operational context that is designed to rapidly detect threats in order to rapidly mitigate them and manage impact (possibly prevent damage). Trying to elevate the threats you know about, and find what you don’t know as a means to manage risk isn’t effective. That’s the wrong way to think about threats these days (and it was wrong back then too). Stop thinking about finding the badness that’s lurking invisible in your network or just beyond your field of view. In the words of the rock band Switchfoot, “there’s a new way to be human, something we’ve never been.” You see, we didn’t solve the core issues with our data analytics approach, we’ve been simply seeking the best bandaids to slow the bleeding. You won’t win the detection game. You can’t. It’s reactive in nature and most threats most organizations will face are automated anyway. By the time your multi-million dollar infrastructure finds the highest fidelity thing for your team to respond to and manage, it’s likely that the damage has been done.

You see, it was at the top of my “detection” and analytics game that I was exposed to a different class of Analyst who took a completely different approach. I still remember the event clearly. The conversation wasn’t “how do you best detect x,” or “what products give you the most value,” but rather, “did you know adversary y has updated tool z with x capabilities to go after a, b, and c data sets?” I recall a presentation where an Analyst (by title) presented a timeline of adversary activity that revealed a clear pattern. They weren’t talking about the lower levels of the kill chain, except in passing. Instead they were talking about predicting when and how their adversary would engage them, and with what tools and processes, exploiting what vulnerabilities in which applications. They focused not on finding suspicious behaviors and effects within their environments, and instead focused on defeating the adversary’s ability to operate against their environment to begin with. They studied their adversary, learning their tools, their behaviors, their objectives, their triggers, their operational rhythms, etc. etc. They were students, experts of their adversary. As I watched them in practice, I realized how simple this whole thing really is. You see adversaries operate in the same space we do, using the same technology, with the same constraints, and across the same infrastructure. They have finite capabilities and opportunities. The more you understand how they operate and can defeat their capabilities and remove opportunities, the far less you have to analyze on the back-end to detect and hopefully mitigate their success. Why not try your best to stop or disrupt them as they are acting? Why wait for them to act, hoping that maybe you can catch them based on the trail of blood they left behind? Yeah, I’m talking about prevention, but more than that, I’m talking about actually managing your threat posture relative to your actual adversary. The beauty is, you probably already have the means to do that, and it will cost you far less than you think.

Let’s be honest for a second now. That means, you have to get very real, and very serious about vulnerability management in your environment. Yep, that’s boring and many thrill seeking, people pleasing, and attention grabbing CISOs will have an allergic reaction to that. That means you have to get very intrusive, restrictive, and disruptive with regard to what employees can do with the assets and access they have been given. That means you have to get into the business context of your company, the software development and product release lifecycle, the services establishment and publishing process, IT infrastructure and asset management, risk management etc. You have to take charge of your enterprise security, and you have to do it from the basis of knowing your adversary, knowing yourself, and using that to influence everything that an adversary could touch in your company.

So what does that mean for the SOC Analyst battling the volume of false positives coming from their IDS sensor? It means they need to stop looking at the IDS, and start running reports to discern their risk status, and writing tickets that will help improve the security posture of the enterprise. What assets are externally facing that have exploitable vulnerabilities without mitigating controls in place? Those are your new high fidelity alerts that require action. What endpoints are running old software that needs to be upgraded? Ticket for remediation. What endpoints are outside of configuration compliance? Break-fix. What login portals are missing MFA? Break-fix. What user permissions are allowed on your endpoints? What users have access to applications they don't need? Schedule enterprise wide change control to remove unnecessary access. What network applications are your employees using that create risk? Time to update policy and submit firewall rule change requests to block access. Who is sending data out of the company and where? Time to remind employees of corporate policy, data privacy, and to block access to anything but authorized data and file storage locations. What software is installed on endpoints that could be used by an adversary? Time to remove it. Which firewalls have policies that are not compliant with security standards (i.e. missing detection rules, or allow more than they should)? Time for a configuration audit with remediation requirements. The list goes on and on, but these are the issues the SOC must begin to tackle. The SOC needs to be all about controlling risk in your environment; not passively managing the impact that results from the lack of proper risk management. No need for fancy analytics, no need for machine learning, no need for behavioral analysis of every single employee. No, the real need is to disarm your adversary by closing and locking all your windows and doors, and making yourself as difficult a fortress as possible to break into. Remember, they can't exploit a vulnerability on an application they cannot reach, or for an application that is not vulnerable.

Once you get a handle on that, then it’s time to ensure you have ongoing visibility into the status of those controls, to ensure you do not drift from your optimal posture. That means core success is no longer measured by metrics like by time to respond, but instead success can be defined by the low percentage of exposed vulnerabilities, or uncontrolled entry points, and unauthorized software.

You will also need to instrument capabilities to discover if adversaries are interacting with your environment, based on how you expect them to operate, but this will become a smaller and smaller scope as you mitigate their ability to operate against you in the first place. Why spend millions of dollars on infrastructure, facilities, and Analysts, looking for signs of lateral movement, when you can simply patch the vulnerabilities so the adversary cannot exploit them, control the delivery vectors so the adversary cannot enter, and lock down your endpoints so adversaries cannot use them?

Where do you start? Threat intelligence. I’m not talking about observables or threat indicators or 3rd party feeds (although those are components of an intelligence program), I’m talking about becoming a student of your adversary and truly understanding them. You also need to truly understand your business. Then, look to opportunities to disrupt your adversary; if you can’t stop them, make it as difficult as possible for them. To do that, you are going to need to become very close with your IT department. You may even need to merge your NOC and SOC together, because they will become increasingly dependent upon each other. The tools your NOC needs to manage assets, are the tools your SOC needs to measure risk. The actions your Network and IT staff take, and the ones that the SOC staff need to monitor to ensure they do not add or increase risk. It's time to get the family back together and on mission - the right mission.

False positives? Meh. I don’t think of them anymore.

Thursday, March 9, 2017

But the Red Team!

It's happened to me a few times now, and if you operate or have had an audit performed by a third party Red Team, I'm sure it's happened to you. Every time I hear the phrase, "but the Red Team..." I want to...well...respond like this:


It goes something like this: during the normal course of your work in information security, you discover countless programatic and technical deficiencies. You find gaps in patch management, failures to follow defined processes, gaps in access controls and visibility, legacy systems creating risk, assets missing security tools or configured out of compliance, poorly deployed and maintained tools, the list goes on and on. As a practitioner charged with defending this environment and closing the gaps, you have to convey to your leadership a clear definition of the problem, what it will take to solve it, and your progress in doing so. At the same time, you have to be very careful about your message, lest you be viewed as that "sky is falling" security guy who just wants to unplug the Internet. You do your best. Priorities will be selected amongst the sea of problems, some you agree with, some you don't, but you get to work and time passes. Then, incidents happen, priorities shift, staff changes, opinions change, new requests are added etc., further complicating your ability to effectively close the discovered gaps in a measurable and programatic way. Thus progress is slowed, although you are doing a ton of valuable work - the work that is being asked of you and described as your priority. You are making progress on closing the gaps, but all the complicating factors make progress really slow.

Then it happens. In secret, your leadership team launches a Red Team exercise as part of their own due diligence to obtain a 3rd party or somewhat external perspective on things. You may or may not be aware until the final reports are in, and those results will invariably be that the Red Team was able to quickly and easily find a vulnerability, exploit it, gain a foothold, move around the environment, and capture some bright and shiny trophy all without triggering a single alert. They will present their findings to the security organization, pointing out the ease of effort and short time it took for them to capture their flags. They will map their actions to the kill chain to make it look very adversary-like in approach. Big words will be dropped to mask the simplicity of what was actually done. You'll sit there the entire presentation nodding your head in agreement as the results confirm your own observations. "Yep, knew that was an issue." Leadership however, will be shocked, furious, disappointed. They'll ask questions like "why was this so easy?" "why didn't you detect it?" "why didn't you tell me it was this bad?" "didn't you say you already fixed that?" You'll be more shocked by your leader's response than the report itself...stunned into silence, not knowing what to say.

Then, the Red Team leaves (as far as you know), or begins scheduling the next offensive.

Next comes the new set of priorities, and the new mantra on everyone's tongue and measuring stick for every new initiative will be "well the Red Team found..." or "that's not what the Red Team said," or "but that's what the Red Team did." All of the sudden, your expertise doesn't matter. You prior observations, recommendations, and progress are tossed out the window because, "the Red Team..." The junior and inexperienced staff in the organization will look upon the Red Team with awe. They will be heralded as some uber l33t hack team that saved the company from being the victim of not knowing just how bad things were, and celebrated as the savior who finally brought clarity to what really should be done. Everyone will chase the Red Team report, scouring over it to find what must they do to fix the issues found, or to seek blame for what is broken. They will praise the sophistication and professionalism displayed by the Red Team's approach and results. "Wow," they say, "we have to fix this stuff right now!"

Worse still, it's highly likely that the recommendations from the report and the priorities that follow will be bandaids that cover up, but do not treat the core issues. Objectives will be set to define new detection capabilities so that the specific things the Red Team did this time, can be detected next time. Arbitrary objectives like "improve patch management" for assets within the scope of the findings will become new short-term projects. Strategy will be washed away by the surge of tactical things that must be done. A new mindset will permeate the organization that says "if the Red Team did it, then that's how the bad guys will do it." A timeline will be set, and a new rhythm of meetings and updates will be enabled. It's what happens.

Dose of reality: if the Red Team found it that quickly and was able to exploit it that easily to that end...then you have substantial gaps in your fundamentals that extend well beyond a 3-6 month timeline for resolution. You have a 2+ year journey ahead. It it was that fast and easy, then they didn't take a sophisticated and advanced approach - they simply followed what was readily available using a standard approach and standard set of tools. That means, as you already knew, things are fundamentally broken across the board. You, or those among your organization, probably already know what is broken, and how to fix it. Worse, treating the symptoms of the underlying problems means the underlying problems will persist...to be found once again by another Red Team, or an actual adversary.

Here's my response to the Red Team:

1. No shit.
2. Thank you.
3. Can I get some sympathy? Apathy?
4. Help me...don't just point out what isn't being done. Do something truly constructive.
5. Next time, seek to understand the actual underlying issues creating the gaps, and provide specific advice to address them.

Here's my guidance to the Blue Team:

1. You aren't alone. Drink a pint or two to settle yourself and let it go
2. Try to lift everyone's perspective out of the report to help them see the real systemic issues
3. Map the report's recommendations back to existing initiatives if possible
4. Hold your leadership accountable
5. Bring your own experts if your guidance conflicts with the Red Team

Ok, all kidding aside, Red Team exercises can be really valuable. They can indeed find gaps in your defenses that you thought you had closed. They can bring awareness and perspective to senior leadership if you aren't getting traction or support regarding your security program. They can help you discover things beyond your ability to do so.

Here's my recommendation regarding Red Team exercises.

First of all, wait. If you are a new leader of an organization, and everyone around you is singing the tune that "everything is awesome," then yeah, consider an external assessment by a 3rd party Red Team as a tool to validate and educate. Perhaps your team is simply unaware of what they are supposed to be doing, or lacks the visibility to discover problems that exist. If, however, your team is telling you things are broken, that the program is immature, and if you have a long list of things to fix and your team is making progress, then please, for the sake of your team and your organization, wait. The Red Team exercise, fundamentally will provide no value. It will simply add more things to do among your sea of existing initiatives. Your team knows stuff is broken. Wait until their major initiatives are completed, until your team has done the work they think is important and is comfortable with the results. Wait until you and your team have a level of confidence that you have done your best, have filled the gaps, and are at a state of relative comfort in the maturity of your security program. Wait until you have your fundamentals in place, you are measuring their effectiveness, and you are ensuring processes are executed consistently. Then, by all means, test. But, if you test too early, the results will discourage your staff and can send you down the wrong path. Red Teams have their place and value, it's key that you choose the right time and leverage them in the right way.

I've seen it. Too many times.

Friday, March 3, 2017

It's Time to Compete in Information Security

One of the recurring messages I hear from family and friends when I talk to them about information security is, "this is a really big problem, so how do we help people understand, and how do we fix it?" In considering this question, I've also been considering how the general population learns about anything new, and how those new things become trends which later become the standards by which we live. You see, once the consuming public understands something, and demands it, they will tolerate nothing less. I think the answer then is, in product advertising. I think the answer is, it's time providers made information security a competitive topic and advertised it as a competitive advantage over their competition. I think we need to drive consumer understanding by putting the topic directly in front of them in as broad a means as possible. Everyone in our nation consumes, so why not engage everyone in their common practice and force the question upon them: "would you choose the products from provider a or b if you knew one had stronger information security practices than the other? Here's what's at risk if you don't choose wisely."

If you are a consumer who assumes information security is baked into every product and service you purchase, you aren't alone in that assumption, and unfortunately you are gravely wrong. In general, product and services providers, heck, nearly every organization out there except the top few (Fortune x lists), considers information security a nuisance; something they must fund, like car insurance, in order to enter the market and provide goods and services. In fact, I've heard the term "insurance" used countless times by organizations as they describe how they view Information Security. The security program is there just in case. Therefore, InfoSec is generally funded or supported in a minimal way to satisfy regulatory controls or external inquiries, but falls far short of actually addressing security challenges. That places you, the consumer, all of us, at considerable risk. You see when you provide someone your information, whether that be sensitive credit card numbers, or pictures you post online, or information about yourself etc., you are placing your personal security, health, and wealth in that third party's hands. Criminals are out there, constantly trying to access that information so they can sell it for a profit. Identity theft, stolen credit cards, product fraud and other consumer impacts are simply the tip of the iceberg. Worse, some nations are out there stealing that information to drive strategic advantage over other nations in the international trade debate. As we become increasingly dependent on connected systems and data via the Internet, we are placing our way of life and often times our own safety and security into the hands of the providers who connect us with those services. Yet for the most part, those providers view Information Security and Systems Security as a cost of business that must be tightly controlled, else it negatively cut into profit.

There is far more at stake here than you realize. What if your smart phone stops working. The phone works fine, but the interconnectedness of all your apps suddenly stops. Think about that for a minute. Think of all the things you do on a daily basis via your smart phone, tablet, or computer. Think about all the things in your home, work, or school locations that use the Internet or networking in some way. Think about email, what if it's gone? Not just consumer email, but enterprise email as well. What about traffic lights? Grocery store registers? Meters and pumps that control water purification and distribution? What if your bank suddenly freezes all your financial assets because someone impersonating you just transferred a bunch of money to a terrorist organization? What if someone posing as you purchased illegal goods or content via the Internet, and the FBI happened to be monitoring? What if your company suddenly finds themselves up against new competition who keeps delivering exactly the same products and much lower costs to your intended customers? What if your personal emails are suddenly leaked onto the public Internet? What about voice conversations? What if you are battling a medical or personal issue that you haven't informed anyone of, and your records, your medication purchase history, suddenly gets posted to social media? What if someone else is watching your home security camera system or your baby monitor? What if that message you just texted to your friend about how much you hate your boss, suddenly gets forwarded to your boss? What if media outlets stop broadcasting via cable, the radio, or Internet? What if you are in the air and your aircraft's flight control system can no longer connect to GPS information?

We are far more dependent upon technology and interconnectedness that we recognize...and most of those organizations providing the services we are dependent upon view Information Security as a nuisance.

In our world of supply and demand, what consumers demand, providers supply. Providers won't provide what people don't want or aren't willing to pay for. When one provider steps forward and boldly advertises something that catches the attention of the consuming public, it ignites a fire that spreads throughout the industry, sparking competition, and replacing the old with new. Take the iPhone for example. Say what you want to about Apple, but they created a demand that became a new standard of living in the US and globally. However, as I'm sure you can attest, often times as a consumer, you may not be willing to purchase something until your understanding of it changes. Just because something is new and advertised, doesn't mean it will stick. Sometimes, when understanding takes root, it creates a paradigm shift, and everything changes. I recall back in the early 2000s, ordering books online from Amazon.com, back when that's what basically all they sold. I recall friends and family members making fun of me for shopping online rather than in traditional stores. I dared them to try it for themselves. The rest is history. Now Amazon is part of the life of almost every American, and we are no longer satisfied with 10 day shipping times. You see, what you know influences what you do. In fact, I believe that what you know or understand, is reflected in what you will do. It's a cause and effect relationship. Before you understood something, what seemed totally unreasonable or unnecessary to you, becomes a requirement and a new expectation, or a new assumption moving forward, once you understand it. Now, thanks to Amazon Prime, my kids are challenged with learning to wait more than 2 days to receive anything new they ordered or was ordered for them. It's a new standard of living, and a new set of expectations. They, and I, won't tolerate 5-10 business day shipping any longer. I believe that if we can establish common understanding of the security topic and use that understanding to drive consumer demand, then we can really address the problem, because consumers will demand it. The first challenge though, is understanding. You may think you understand the risks because your hear about security in the news, or read about the latest big data breach online. However, I challenge you on that. I think you are aware of some of the impact...but you lack understanding.

Consumers have been inundated with a constant stream of bad "cybersecurity" news from the media over the past decade. The escalating trend of breach after breach, growing larger and larger, has effectively elevated the cybersecurity issue to a mainstream discussion within our nation. We are generally aware, however as I recently wrote about in another post, I believe one of the largest challenges facing us in the information security (InfoSec) discussion today is, we have a broad and increasing sense of awareness on this topic, but a general lack of understanding. What I mean by that is people are generally aware that data and information systems protection is a problem, but they don't understand the depth of the problem, nor what it takes to fully address it. Worse, they assume that what they are aware of is all there is. When you don't fully understand something, you will naturally make a lot of assumptions about it. The media and entertainment industries haven't helped. In fact, I believe they have hindered this understanding. Based on my conversations with people in life, I believe this lack of understanding has led to three dangerous assumptions among consumers; first that those providing products and services they are dependent upon, are naturally doing whatever it takes to address the security problems we face, second that there is simply nothing that can be done about the information security problem and we should just accept things as they are, and third that the bad cybersecurity news is just more background noise from the news media and isn't really a major problem - it's just hype. Regardless of which of these assumptions consumers hold to, this "awareness" causes them to assume that those providers understand the problem and would naturally do whatever it takes to address it, and whatever they provide is sufficient.

However, this lack of understanding also permeates the provider side (product and service organizations). Organizations and corporations recognize that they must do something with regard to protecting sensitive data, including the data their customers provide them. These organizations are driven by producing something that others will consume. That means they look to their consumers to define the requirements for the products or services they will provide. That means guardians of our sensitive data and the systems we're dependent upon for everyday life, are looking to the consumer to establish the requirements for those products and services. When consumers are ignorant, they won't set the right expectations of their providers. When consumers are ignorant, and have a basic level of awareness about the security topic, they will be satisfied with their providers implementing minimal security capabilities that satisfy this basic consumer awareness. This minimally compliant approach falls short of actually fixing the problem.

I'll give you an example. I recently worked for a software company who developed software for the general public. They were what those in the industry call a "business to consumer," or B to C, company. Due to increasing security issues their consumers were experiencing (they became aware of the real problem), they were directly faced with the question, "what should we do, and how far should we go?" They decided to put the question to their customers and developed various examples and simulations that demonstrated various depths of security features that their customers could benefit from. They found that among the sample customers they engaged, the majority actually provided negative feedback when they interacted with the models that had more security features built-in. This translated in business terms to potential loss of customers if they moved to an approach that adopted more security into the product. However, for those customers who were impacted by the low state of security in product, they were willing to tolerate a different, more secure experience, because they now understood the problem. In fact, I recall hearing of customers being surprised by the lack of security in the product, asking, "why didn't you do x already?"

I believe it's time providers started generating a demand for improve information security, by directly advertising and educating their customers on the topic. I think it's time information security became competitive advantage. I think it's time organizations stepped forward and fully exposed the security problem, but demonstrating how they have solved it and their competitors haven't.


The SOC - Why We Get It Wrong

The SOC topic is often controversial, with some championing that SOCs are the ONLY way to go, while some criticizing that SOCs are purely show pieces with no real value relative to the mission they purport to execute. I've live through that controversy all of my professional career. I worked in a SOC for almost 14 years (one of those world-class ones that other organizations try to replicate), and followed that with 2 separate organizations pursuing internal SOC builds for different reasons. During my decade plus in a SOC, I was asked to help "Next-Gen" the SOC a few times; to make significant steps forward in maturity and function. I was also asked to replicate it, both for ourselves and for our customers. I was the guy you talked to when you came to see our SOC or when you engaged our services to either build one yourself, or have us build it for you. I was the guy who you met at our vendor booth at various security conferences, trying to convince you of the values and merits that a SOC (especially an outsource MSSP) has to offer. I've lived it. I found value in it. I think I have an "expert" opinion on it.

So, when I see articles like the one posted recently on Dark Reading that challenged the value of SOCs by exposing that a mere 15% of organizations with SOCs would call them mature, I'm naturally tempted to opine. Even better, when I see people comment to that article or on LinkedIN about that article, using it to justify why SOCs are a waste of time and money, I'm hooked, and I have to say something.

This may surprise you, but I do not advocate that a SOC is for everyone. In fact, I believe that very few organizations in the world would actually truly benefit from what a SOC has to offer; namely a continuous perspective of that organizations threat status, and an effective way to rapidly mobilize and coordinate response actions should they be warranted. Should every organization with a security team build a SOC? In my opinion, no. Should every large security team organize themselves around a SOC concept? Again, in my opinion, no. But even those questions miss the point. People say a SOC is a waste of money and they point to failed SOCs as examples, citing the proof that orgs who tried to build the SOC ended up no more mature than organizations without them. This represents a failure in implementation, and a miss-understanding of what the SOC is. 

The part that is missing from the debate, is that many look to the form of the SOC to define the security organization's function. It's not supposed to be that way. A security organization's function, may be best organized into the form we call a SOC, but starting with the form first fails to recognize the fundamentals necessary to leverage that form.

It's an age old debate: if I want to become a marathon runner, do I start by buying the gear used by the top athletes in the Boston Marathon? No, I use what I have and make incremental progress, maturing my skills until my function necessitates a different form. Dressing like someone, does't help me fit the part. It might help facilitate a mindset shift, and it might help me with motivation. But fundamentally, dressing like a marathon runner does not equip me to run a marathon. Running does. Training does. Diet does. Passion does. Necessity does.

A SOC is no different. It is one expression of how certain organizations have decided to organize and facilitate security operations work to best align their resources within their specific needs and goals. It is the expression of a type of security program, not the goal of all security organizations. This is fundamentally what so many people get wrong in the debate. You don't build a SOC to make your security program mature; the natural maturity of your security program may lead you to building a SOC. Every security organization should strive for operational maturity, and should perform incremental steps at mastering what they do and need to do. The things that are fundamental to a mature security organization will also be fundamental to a SOC, but the two are not synonymous. In fact, some of the key attributes of a "mature SOC" may have no relevance to your actual organization needs at all.

So to sit back and declare SOCs are a waste of time because so many organizations with them have not found maturity by embracing them...completely misses the point. The point is, those organizations were not maturing to begin with. They took a Field of Dreams approach; building it, and hoping the maturity would come. It doesn't work that way.

Is a SOC right for you? That's a difficult question to answer, but I would start by exploring the actual tactile benefits that form would provide for your function, and ask yourself if that form would better enable those functions or not. What are the core challenges you face as a security team? Would organizing your efforts around a SOC solve those challenges? Are the benefits you are chasing solutions to your actual problems, or does the SOC represent a solution to someone else's problems?

My bet is, you don't need a SOC. Fundamentals exist with or without a SOC. Get those right first.

Thursday, October 27, 2016

How CISOs Are Really Measured

Modern CISOs have one of the toughest, most stressful jobs in the world. There are far more risks to businesses today, then there were ten years ago, and many of these new and evolving risks come from the cyber world. Business risks used to be largely limited to competitors taking over portions of the market, failure to deliver on expectations of customers, or rising operating costs; things a business can control. Today however, there are adversaries who are actively trying to disrupt and break business for their personal gain. As we've seen in countless examples, a breach or a successful disruptive attack by a malicious actor or group can cause financial damages to the impacted organization in the ranges of millions to tens of millions of dollars. There are also some nation states who actively infiltrate organizations, steal intellectual property, and disseminate it to growing businesses within their nation, creating international competition or potentially locking international businesses out of global regions. Additionally, as news of cybersecurity issues has now become mainstream, general awareness among both consumers and providers has grown. Assumptions of data security are being replaced with fears of personal or corporate damages followed by regulatory controls and mandates, which means effective cybersecurity practices among providers have become not only a competitive advantage but also a requirement to do business. It's on everyone's mind, and must be addressed.

A single successful cyber incident can put a corporation out of business, whether that be through loss in customers caused by a loss of their trust and willingness to do continued business, through the inability to operate due to a disruptive attack, through being denied access to industry as a result of non-compliance with regulatory standards, or through financial damages sustained by an organization through the remediation and post-incident activity. This is the weight on the CISO's shoulders; the viability and longevity of the business. Product teams have to produce awesome products. Marketing teams have to reach customers effectively. Human Resource teams have to ensure the right talent is attracted, hired, performs, and is retained. CISOs have to protect the business and enable it to function.

In addition to the pressure of potential damages and negative business impact, the CISO also has to manage the fact that the cybersecurity industry is constantly changing. That means they have to be constantly learning, and refreshing technology, process, and people. Adversaries are constantly refining their trade craft to find new ways to break into maturing defenses, and the entry into the criminal underground is becoming easier and easier. The rise of successful threat activity has attracted more and more criminals and has resulted in the monetization of the development, distribution, and use of tools and processes used to perpetrate cyber intrusions. There are now criminals who no longer hack, but instead make their living developing and renting access to tools that others can use to hack. The better they can make their tools, the more customers they will have. In addition, regulatory controls and customer expectations continue to change, forcing CISOs to have to continually develop and implement new controls to satisfy these expectations, so that the company can simply do business. The entire CISO world is in a constant state of change, and the information security program must keep up. This is unique from almost every other industry where it's common for problems, materials, costs, processes etc. to remain static for decades at a time. High-tech and health fields are unique in this way.

So not only does the CISO have to keep data safe to protect the business, they have to also actively thwart the adversary which is continuously growing in numbers and sophistication, but they also have to continually adapt to the shifting demands of customers and industry. They have to execute well, learn how to execute differently (just after they finished), and manage the transformation from what was effective yesterday, to what is needed to be effective tomorrow. Ready to apply?

Given this, we might assume that the CISOs measurement of success is in ensuring security incidents do not happen. We might naturally think that the CISOs annual performance goal says "make sure there is no breach," and at their annual performance review, the CEO looks at the news headlines and if their company name wasn't listed for a security related issue, then the CISO get's his bonus. We would like to think that the Information Security organization led by a CISO is there as sort of the protector and guardian of data, preventing massive losses that could come from data breaches, brand damaging events, or disruptions to service delivery. Well, we are, sort of. However, preventing losses isn't really what CISOs and their support organizations are measured by. Yeah, really. Why? Well first because you can't measure that, and second because that's not a return on investment. 0 breaches means equalization and investors aren't interested in keeping the status quo. A CISO can't prove success simply bad things didn't happen, nor is it sufficient for a CISO claim success based on the number of attacks thwarted. Those two indicators of doing security well don't translate to what CEOs and investors care about. What CISOs are measured by, is how effectively they contributed to revenue and profit generation. Unfortunately, this usually means CISOs are primarily incentivized to do something other than the things security practitioners are most passionate about. At least not for the reasons they are passionate.

The hard reality is, a CISO will have to say "no" to implementing optimal data protection, if doing so negatively impacts revenue or profit in a measurable way. They have to. Their mission is data protection to drive revenue and profit generation.

If you are a security practitioner, I'm sure you can relate to a time where you defined, selected, or had the opportunity to implement some new security capability or product, only to learn that you can't enable or leverage all the cool features that you know will keep data safe. Right? The CISO was probably the one who brought you that bad news (or it trickled down via your manager). You probably assumed that they just didn't understand the problem or the tool, and that they were simply making a "bad" decision to stretch the limited budget available to them. You probably thought they were being shrewd and that the business leaders just don't understand information security.  Well...maybe that is the case. More likely however, your CISO made a calculated decision to leverage this opportunity to improve their value to the business. CISOs don't usually sit back and say "yeah, I know we could completely mitigate that risk, but I just don't want to." They think, "if I spend those resources there, on that issue, which could potentially have that minor impact, if these certain things happen, then I can't use those resources over here for that other thing that could help the x product team unlock that new customer sector." That's more likely what's going through their head.

They aren't saying "no," as much as they are saying "if I do that, then I can't do this, which is more important to the business." Security practitioners like to be purists and claim it's all about the data protection mission. The reality is, it's all about the business, otherwise none of us would have jobs, and the CISO is part of the business leadership (or should be).

You see, the CISO role is revenue AND profit generating in many cases. Getting products into customers hands generates revenue. Profit comes from the margin between revenue and cost. However, CISOs aren't just about cutting costs to maximize revenue and they aren't just a lever to compress operating costs. This is where many business leaders and investors get it wrong too. Businesses identify opportunities for growth based on complex calculations, and they define their business strategy according to where and how they believe they can win business, generate, and grow revenue. As one example, I recently learned that a CEO was evaluating two different classes of potential clients; small and medium businesses, and enterprises. The potential revenue to be captured vs. the cost of winning the potential business caused the CEO to choose one class of customers, and to intentionally exclude from business strategy the other. However, in order to reach that chosen segment that the business growth strategy was dependent upon, the company must show compliance with regulatory standards and customer expectations for information security. In order to secure the profit promised to investors who are backing the business, those new customers have to be engaged in the right way and within a certain operating cost or ratio of cost to revenue. That means in order for the business strategy to be effective, the CISO now has to build an information security program that is effective, is compliant, and doesn't exceed operating cost goals. If the CISO fails, revenue and/or profit goals will not be realized. In this sense, they are enabling AND protecting revenue and profit expectations.

You see the business can't even engage the potential customers until the CISO can ensure the information security program satisfies regulatory compliance. That means revenue is unattainable without the CISO. Further, customers won't sign the deal until they have an assurance that the information security program meets their expectations and is effective. That means revenue won't start without the CISO. Additionally, revenue sustainment for that customer or sector is dependent upon the continued performance of the information security program (no breaches and adaptive to changing customer requirements). That means if the CISO fails, revenue will be lost or growth opportunities will be missed. Finally, if the information security program doesn't remain cost effective, profit goals cannot be reached. The more effective and efficient the information security program, the more revenue is possible, and the wider the profit margin can be.

That means the company's overall revenue and profit goals are dependent upon the CISO, which is a shared responsibility among all the executive staff. The idea that the information security program is simply a cost to do business that must be controlled, represents a mindset that doesn't understand how this works.

Let's say a customer comes to a business and says, "we would buy your product if it did x." Your product team would consider development of that capability, a revenue generating act because doing so captured that business. Product and sales teams are viewed generally as revenue generating. If the customer also says, "and we won't buy your product unless you can ensure the data we give you is secure." That becomes a requirement that the information security team must deliver along with the product in order to win the deal. It's no different than a customer saying "I want feature x from your product." They want your product to include functionality and security. Perhaps the security they seek isn't within the product itself, but rather within the realm of the customer to provider relationship. Either way, it's a customer requirement that must be met in order to capture the revenue that comes from the deal, and it's a requirement that must be sustained to maintain the customer relationship in good standing, just as continuous service delivery is.

I can tell you dozens of stories of business to business or consumer to business relationships that were dependent upon the success and confidence of the information security program. I'll tell you right now that if customers lose trust in a product or company for security reasons, revenue and profit will fall. That means effectively establishing and actively maintaining that trust, causes revenue and profit to rise.

Some may argue that information security is just part of the company operating costs just as IT, HR, Legal, and other internal functions. I disagree. I think it was that way a decade ago, but IT, HR, Legal, and other internal functions are not direct requirements from customers; information security has become so. Customers don't often send specific requirements or validation requests to other internal function teams, and regulatory compliance mandates don't usually call out those functions.

So, when it comes to the senior leadership of a company, including the investors and board of directors, the value of the CISO really comes down to whether or not their actions enabled and contributed to revenue and profit. If not, then the program and the person suffer. If so, then the program and the person are rewarded. What specific questions can executives measure CISOs by or what can CISOs use to prove success:

How many customer sales did the CISO directly interact with to help capture?

How many customer sales were won as a result of the information security program? Conversely, how many sales or customers were lost due to problems with the information security program?

Did the information security program successfully remove barriers to enter or maintain market and customer engagement?

Did the information security program operate in a cost effective manner so as to not disrupt the expected revenue and profit goals while securing existing and new business?

Did the CISO effectively lead the business through a security issue that resolved without great loss?

If the executive leadership or board can honestly say, "CISO, because of your efforts, we were able to access sector A, and capture customer Z, and you did so while maintaining a cost effective program," then the CISO has won. If the leadership team says, "CISO, you did a good job keeping costs down which helped us meet financial goals," well that's good too, but a CISO is more valuable than that. If the executive team says, "CISO, you kept us out of the news," well...that's good, but it's also bad, because someone could easily argue that the lack of attackers or attacker interest kept the company out of the news.

The true business value of the CISO can be found in how they directly enable, capture, sustain, and protect revenue and profit. Information security is no longer just a cost to do business and if your CISO can't demonstrate otherwise, then they may not be the right one for your company.

Tuesday, October 25, 2016

Should You Build a SOC?


There is a section of consultants and educators in the cybersecurity industry, who proclaim that the litmus test for having a mature information security program is revealed by the presence and maturity of a dedicated, in-house Security Operations Center. Their message says, that if you have a SOC, you have arrived. You are doing it right. You are the mature security organization. That has many others wondering, "is it time for us to move in that direction and obtain that level of program maturity?"

I've been there, done that. I worked in an MSSP for over a decade, whose SOC was world-class and served as that model for our customers, partners, and other interested parties. We also provided SOC build and maturity consulting services to help organizations reach what we had attained. We were an early SOC and had the luxury of maturing ahead of the industry to truly lead the way. We went through several significant periods of re-design (in form and function) as the threat landscape and technology scene changed. We also had the opportunity to replicate our work several times, which forced us to review end-to-end what it was we were doing, why, and what should change. I was one of those consultant, education, practitioners who carried that message of maturity forward into the industry. I have also since been part of two organizations, who didn't have a SOC but were considering building them. In fact, one of those two organizations claimed they did have a SOC, called themselves a SOC, and even used email aliases with the term "soc" in them, but when I arrived, it turned out they didn't have any of the fundamentals (except the email address) that represent a SOC. They didn't even have a room. The security organization recognized the value, but didn't know how to actually build or operate. Hence my mission when I joined. The organization I'm currently at is considering this question as well. Is it the right time in our story to make the investment and step up the CMMI ladder to the next level?

If you are considering the question of building a SOC today, in 2016 or beyond, please keep reading. I may have some surprises in store for you.

The premise behind the modern Security Operations Center, or SOC, is to enable common awareness of the security state of the enterprise, with ample staff ready and trained, supported by carefully instrumented technology and defined processes, to ensure you can pounce on the right security issues that arise with consistency and expediency. SOCs represent the embodiment of a full and mature implementation of the NIST 800-61 standard; a center of excellence purposely built to enable full incident response lifecycle management in an intentional way, tailored to the organization in which they live. SOC staffing models seek to maintain sufficient staff that can handle numerous incidents at the same time based on the expectation that numerous incidents will happen on a daily basis. Staffing models are also designed to ensure teams can sift through the massive volume of data SOC's consume, to effectively triage and identify the issues that require action. As a security leader with a SOC, you will know that no matter the volume of security issues, nor the specific person on shift to handle them, you have a place to effectively coordinate and manage incident response in a consistent and professional manner. Your people will be aware, accessible, and equipped. You can have the cyber equivalent of NASA's mission control center. You can walk into a room, and have immediate situational awareness regarding your organization's cybersecurity posture.

The Operations Center mindset isn't new or unique to cybersecurity, and in theory it does make a lot of sense to have that single point of awareness, visibility, and coordination, especially given the risks facing organizations today. Additionally, more and more organizations expect high standards from their IT teams, and what better way to ensure quality, than with highly defined processes, commonly trained staff, and reinforced physical and logical structure? It works for call centers right? That all sounds great, doesn't it? Realistically though, what does it take to build a SOC, and can you do it yourself?

The reality is, if you are an organization who has decided the full mature SOC model is something you'd like to implement and operate, then you don't have one today. That means your existing team wasn't able to or chose not to operate in that highly structured manner. In my experience, that's most likely because they haven't seen it before, haven't been resourced accordingly, or don't agree it's for them (doesn't match their or their company culture). Regardless of the reason, they aren't operating in that SOC paradigm, and in my experience, they won't be able to build it for you. There are two primary challenges facing them; the day-to-day work that they already have to do, and the practical knowledge that they need to guide their actions from their current operating state into that final SOC model you desire. Quite simply; they won't be able to build the roadmap and won't be able to execute against it. They are busy, and won't know how to build the SOC. You will need help. You'll need a dedicated team who has the experience of both building and operating within a SOC because it's a complete mindset shift, it takes a ton of work, and unless you've lived the value, you won't appreciate it and a lot of what it takes to build a SOC may not make sense up front. The SOC build journey is extremely expensive, and it takes a lot of time. At least one year.

Hire any of the big consulting firms with the mission to build you a mature security operations program, and they will ship you a small army of experienced consultants wrapped in a nicely structured package with a 1 year roadmap to deliver said capability. This build team will run in parallel with your current team, and may engage/partner with your existing staff depending upon the maturity of what you already have. The team of consultants will probably contain one person focused on building out your technology layer, predominantly your log and event analysis capability. Another will be focused on developing use cases for threat detection, and the processes and playbooks that define how to leverage the technology and what to do when an alert is triggered. Another consultant will be focused on the people story, finding you the right talent, building a staffing plan, training plan, retention policy etc. Finally, you'll have a PM as your go-to person who will orchestrate this madness in a very structured manner to build this new function for you. It will be great, and expensive. You see, it takes a dedicated team a long time to build a fully operationalized and mature security operations practice. I know because I've lived it.

However, is that feasible, realistic, or even relevant to the majority of InfoSec programs and their parent organizations today? Having been there and done that, I can tell you with confidence that you probably don't need that highly structured, mature SOC that sounds so appealing. Yes, the message I now share to the industry has changed, because times have changed. Let's take a look at some of the primary selling points of having that fully developed and mature Security Operations Center:

  • Centralized (and physical) orchestration of all things InfoSec, namely communication, monitoring, and incident response coordination
  • A room that facilitates common and continuous awareness of the state of security for the monitored organization, staffed with personnel who are ready and quick to respond to security issues
  • Around the clock, 24x7x365 staff performing active monitoring, ready to detect and manage any issue that should arise
  • Highly structured processes and procedures that enable consistency and efficiency in service delivery to the organization
  • Specifically configured technology that supports visibility, awareness, and execution
  • Accountability and validation that Analysts are doing what they need to
  • Rapid and personal communication among InfoSec, and namely Operations staff to facilitate detection, analysis, and response actions
  • Dedicated facilities built and secured for use by the Security team
  • Controlled access, separating sensitive information from the common employee community or from visitors
In a sentence, the SOC represents a place of focus; a place where a team can assemble with all the right equipment, to perform a function in a consistent and coordinated manner. But, do you need a room for that, do you still need the same equipment, and do you need to invest in the level of effort to ensure that highly automated and repeatable experience? How many times will your staff need to coordinate together to deliver that same repeatable experience?

Having staff at the ready, armed with the tools and processes to manage that alert as soon as it pops sounds great right? The assumption behind the modern SOC is that you need those resources at the ready, because you are under constant attack by sophisticated adversaries who can bypass your controls and will break in. You need the SOC so that you can rapidly detect these problems and act to mitigate them before they become a major issue. In fact you'll still hear the terms "worm" and "outbreak" used in SOC circles because that's the old school world and problem statement SOC's were created to solve; rapidly stopping the expansion of a threat before it could reach catastrophic levels. To do that, the story says you need 24x7x365 and tons of data to correlate into actionable events, plus awesome dashboards that track trends and status, plus flashing lights that "sound" the alarm when something interesting happens, followed by automated orchestration that creates tickets for Analysts, pre-populated with data elements gathered from multiple different sources to enrich the ticket with attributes that will help answer questions on your Analysts's minds. It's all about speed; you have to out-pace the attacker. If you are really good, you might even have pre-selected and presented playbooks or IR actions ready for your Analysts to use. Then, your team springs into action, performing initial triage, coordinating next-steps, performing an initial assessment, carving out action items...and away they go racing through the incident response lifecycle while their adversary on the other end of the globe races through the kill chain to reach their objectives before they can be cut off.

That's the way it works right? We still race attackers and worms through our networks right? This is a live game of cat and mouse isn't it? Well, let's talk about that.

I haven't seen a worm or virus outbreak in years. The closest example might be a phishing campaign or targeted attack that spreads select malware across multiple assets within an environment, but those are few and very far between. It turns out that most malware we face on a daily basis is highly automated and well known in terms of identifiable characteristics. The data points you actually need to look for to validate a malware infection from a detection and data analysis perspective, are actually few and easy to find. Attacks are also highly automated and usually set into motion without direct supervision by the operators. That's not always the case, but I'm speaking to an 80/20 split (80% of the time vs. the 20% of exceptions to the rule). Pick your malware and delivery methodology. Pick your modern campaign. What your Ops team will encounter on a daily basis is most likely crimeware, delivered through broad and not necessarily targeted phishing campaigns where you are one of many. Your employees may receive emails with malicious attachments or URLs, or they may stumble upon a web exploit kit after having been re-directed from their favorite news website that happened to be poisoned with malvertising. Phishing campaigns (by far the most common threat or attack we experience today), are a constant wave. 50-100 malicious emails a day is a likely number, but with modern technology, only the first 1-5 will actually get through. The eventual malware that drops (if the attack is successful) will automatically begin performing it's defined functions, which often includes local system profiling, immediate data theft, and check-ins with it's command and control server for further instructions. Many of these malware infections, after immediately posting the data they were designed to initially steal, then sit, waiting for future instructions from their master, which may come days, weeks, or months later.

The most popular malware experience of 2016, crypto/ransomware, performs it's damage immediately upon infection. There is no race. Once installed, it's game over, and the last time I checked, crytpoware doesn't have worm-like properties. The race, is actually one of prevention in the first place, but I'm getting ahead of myself. Let's first look at the infection race.

What is the average time from infection to action by an adversary? Before you can decide on the resources and overall success criteria for your SOC, you need to understand the adversary you are up against and how they operate. If you plan to equip yourself to win the detection and remediation race, you better understand how fast you need to be able to go. In a recent investigation I completed, an adversary, having successfully brute forced their way into a server where they created a local account with admin privileges, left the compromised server untouched for weeks. They attacked, established their foothold, then left. When they came back, they simply did so to validate their access and to install a few preferred user tools (including the FireFox web browser). They went silent again for a period of time, and came back about a month later to install more tools. That race was minimally weeks, if not months long and the actual impact was nothing short of an annoyance.

This follows another incident I investigated about 2 years ago where an adversary compromised a publicly facing web server through a true 0-day, dropped a local web shell, used it to enumerate and understand the files on the target system, then left it alone for 6 months until we found it. Sure, they exploited a server and gained root access, but they were apparently staging themselves for a later action. Again, I'm speaking to the 80/20 rule here.

In another recent example that shows the variations on the attacker race, I helped investigate an incident where an AWS console admin API key was accidentally published to a public GitHub repository. It was there for a while before an adversary noticed it. When they acted with that information, they did so very quickly. They used it to quickly spin up a bit coin mining operation in unauthorized AWS resources. The damage to the business? None really.

Your ready-to-go SOC team will likely be able to detect the stages of infection, and will likely be instrumented (due to the general noise and low liklihood of requiring action) to monitor for indicators of compromise. At best, they will be able to respond to validate and perform some level of mitigation within hours. That means they may be able to prevent that infected system from being used for further outcomes down the road, but it's unlikely that they will be able to prevent the initial data theft (browser information or locally stored data), nor will they be able to prevent cryptomalware from taking it's effect. In the true 0 day example where the adversary was active from the moment of installation, by the time they completed their enumeration of local data and potential extraction,  a well defined SOC team would still be in the initial triage phase. You see the automated attack will always beat the reactive SOC, and the manual attack (on average) likely doesn't require mitigation within a few hours...it can probably wait. However, if the malware or methods your adversary is using is that easily identifiable to enable, and if the infection context is so easy to validate that you believe you have a fighting change, then why didn't you automate prevention in the first place?

Ah, there's the root question and that question flies in the face of the traditional SOC argument. Does automated threat prevention work, and if so, what does that mean to the modern SOC built to chase IOCs and handle multiple intrusions at the same time? Conventional wisdom in the industry says, no, prevention doesn't work. We tried back in 2000. Conventional wisdom says you will be infiltrated and you can't prevent it from happening. That's true of a small number of potential scenarios; true 0 days where you were targeted, or true crafted infiltrations by a nation state who developed tactics unique to you. Or true for the first wave of phishing emails that come from a new campaign. However, for the mass majority of issues your operations team and organization are likely to face, I say yes, they can be prevented. Relatively easily. At least that has been my recent experience.

This especially proved itself over the past year at my present place of employment where we embrace the prevention story 100%, where sandboxing is king, and where we actively build what we know back into our products to enable prevention next time, not just detection. In our world, that incident situation that led to manual remediation efforts, becomes not the basis for a new SIEM detection use case, but rather a candidate for research and prevention in our core inspection technology. If we missed it, then we treat that as a bug in our prevention stack, and work to fix that. It's built into everything we do; prevention works, prevention first. If you can define it, you can prevent it. It's actually quite simple when you leverage the right tools.

When I first joined my present company, I didn't believe the prevention story. I thought it was interesting, and had potential, but I didn't believe it. I pulled in my extensive list of IOCs that had been experienced by myself or others, pulled all our relevant logs into one massive searchable repository, wrote out my top 50 threat scenarios including the data attributes and analysis logic that supported them, and went to work hunting for all the infections I thought I should find. My prior years of experience led me to assume that we should expect to handle about 5-10 endpoint infections per day. We were seeing about 1 per week. As I started drilling into my logs, I quickly identified what I expected to find; emails containing suspicious attachments and URLs, web browsing sessions that looked strange, funky DNS requests from internal hosts, endpoints making connections to known malicious websites, active content and files being downloaded by employees that were marked as suspicious, outbound connections to IPs on known C2 lists etc. All of these were indicators of attack, some of potential compromise. Then, as I continued my investigation methods, looking for attributes from the endpoints that would validate impact, well, I found...nothing. This continued for weeks, and aside from the 1-2 positive threats found per week (which all turned out to be greyware), my assumptions about prevention not working were shattered. That darn sandbox. Worse, I was hard at work also creating processes and playbooks, assuming we needed highly defined structure to ensure repetition and efficiency in IR. I was gearing up for a dozen infections a day. I found on average, 1-2 per week.

You see we do two fundamental things right. First, every file retrieved or delivered from the Internet, we analyze offline via static and dynamic sandbox technology that also compares characteristics (file identifiers as well as behaviors) with other known threats. When the verdict comes back that the file is malicious, our technology prevents the download or delivery, or prevents the local execution on the endpoint. Second, we don't just focus on the installation phase of the kill chain, but took it up one notch and also identify and prevent signs of exposure to exploit code or sites that behaved like web exploit kits. For email, we proactively analyze links delivered to employees and update our prevention tools behind the scenes based on our conclusions of that analysis so by the time the employee could click, we had dynamically updated our block list and prevented access. We inspect everything that could lead to an infection.

It actually works. Better yet, we log a lot of detail along the way, providing Analysts visibility into the sequence of events we detect leading up to the prevention decisions. However, as I was coming to the realizing that prevention actually does work, investigating these various triggers and alerts kept me in a logic loop. Here's a few examples.

Let's say an employee browsed to a suspicious website that was allowed by policy for some reason. While there, they were redirected to a website that contained a malicious ad that redirected the browser to a web exploit kit. Great, that's a candidate for a further look from an event analysis perspective. We have plenty of solutions from industry that can identify web exploit kit behavior as it happens. In fact, there are open source solutions and rules that enable you to do that. But better yet, we just sandbox everything anyway. The obvious next question in the WEK investigation process is, "was the endpoint then served a file?" If yes, then we would need to analyze it to determine if it's malicious. If it didn't, then we would need to monitor the endpoint for new activity out of the normal. What do you as an Analyst do to validate that? You try to re-create the exploit experience or try to grab a copy of any files that were transferred to the exposed endpoint, then you'll probably send them through a sandbox or up to VirusTotal etc. for analysis to find out if it's malicious or not. Here's the deal, our network sandbox technology automatically grabs a copy of every downloaded or served file and runs it for further analysis, automating the validation and prevention process. The very work I as an Analyst was preparing to do, our technology already did. At the same time, our endpoint protection technology, monitors every process that attempts to execute locally, also sending it up to the sandbox for further analysis. So...why not just focus my efforts on monitoring the results of the sandbox analysis since that's what I need in the end - some validation that a malicious process was delivered or is running? Well because if our sandbox can detect it, it would have, and our control technology would have prevented it based on that sandbox verdict. See my logic loop?

In another example, let's say an employee receives an email that has a malicious URL, and my technology detects that, but still delivers the email because it took some time to analyze the site or because policy allowed it for some other reason. The employee might click right? Well maybe, but once the verdict on the URL is decided, the technology automatically implements a block or permit decision. What if the URL used some sort of cloaking technology to evade sandboxing and the verdict is returned as "benign" or "unknown" so we don't prevent the employee's click? Well, fine, if the website serves a malicious file to the browser as a result of the click, my network stack will grab that file and send it off to the sandbox for analysis. What if that failed? Then I would traditionally look for signs that a file or process was dropped on the endpoint, or that some suspicious new traffic or behavior was observed from the endpoint following the URL exposure. Oh, but my endpoint solution is already monitoring all local processes and sandboxing them as well.

See where this is going? We sandbox potential weaponized items like URLs and files on their way in. Then we sandbox any content that transits our perimeter. Then we sandbox every process that attempts to execute on endpoints. That's delivery, exploit, and installation prevention. Better yet, it's not IOC dependent because we perform unique analysis every time.

In another example, let's say I'm looking for IOCs on the network - specifically at network communications that resemble known threats. If I find some, I'll need to get access to the endpoint, find the offending process, and analyze or validate it in my sandbox right? Well, it turns out that since my endpoint solution continuously monitors all new processes as they start, and performs analysis on-the-fly including submitting the process for sandbox analysis. It also looks at the local actions performed by the process to determine how closely those resemble malicious actions to help influence the prevention decision. So I don't need to look for IOCs on my network, because I'm constantly monitoring every process that tries to run on an endpoint...and if it looks malicious, I'm preventing it from executing.

Ok, so what about an exposed web server that has a publicly facing form and a data input validation vulnerability that allows a remote attacker to upload a file which is remotely activated through a crafted URL? Well, my network appliance is going to send of copy of that uploaded file to the sandbox for analysis, while the endpoint solution is going to monitor local execution for signs of malicious activity. Done.

See my point? The sandbox and endpoint solutions that are now available have actually changed the game. Quite effectively. If you are preventing the adversary from delivering their weapons and preventing the weapons from running, then you can significantly reduce the number of investigations and incidents in your environment, thus eliminating the race condition your SOC is gearing up to win. You can defeat the adversary before they even have a chance.

So what does that multi-million dollar SOC with all it's IOC detections and correlation capabilities and workflow automations do now? It idles.

You might ask, "well what about DDoS, malware-less intrusions, and insider threats?" Yep, those are still a concern...but enough to warrant 24x7x365 SOCs? DDoS is solved for via off-premise solutions like Akamai (Prolexic) and others. Malware-less intrusions and insiders are still a concern, but again, thinking of the actual value in rapid detection and response, does it actually gain you much? Anything? I'm not convinced it does. It takes a lot for me to say that because 6 years ago you would have found me in my employers RSA Conference vendor booth, selling customers on the rapid detection and response story, touting our time to detect and accuracy of detection capabilities. As of today, I'm just not seeing the value anymore.

Granted, there are still several other security scenarios that may come into your experience.

What about employees who accidentally post sensitive information on the Internet? Yeah that's a problem that must be mitigated, but you aren't going to detect that with data feeds and SIEMs (at least you don't need that level of complexity for that detection).

What about employees who bring in their own infected laptops and plug them into the corporate network? Ok, well we still prevent known C2 calls (again, based on our own analysis of malware we've seen plus all the malware samples our product vendor has seen). Even still, what are the odds that the adversary would be attempting to remotely control that device while it's present on your network? Probably not going to happen.

What about stolen property (laptops, servers, tablets, smartphones etc.)? Yeah, that's still an issue, but you don't detect stolen property with your SOC. You might respond to reports of stolen property, in which case you'll simply file a police report, assess potential damages, and try to perform a remote wipe of the device or data if you have an MDM solution.

You see, my point is, I believe the security industry has actually solved the primary problem so effectively that the SOCs of yesterday no longer apply. The SOC concept was designed to enable defenders to detect and respond to intrusions faster than the adversary could operate to leverage them. However, today, we can simply eliminate the intrusions rather than build compensating processes around them. Rather than putting millions of dollars into that SOC gear, put millions of dollars into prevention through solutions provided like those from Palo Alto Networks, CrowdStrike, or others focused on the prevention story end-to-end. My experience is, your Ops team will have few and far incidents to manage, which nullifies the value statement of that SOC room. You don't need central comms; you can have central ticketing and chat for rapid engagement when needed. You don't need a room to facilitate common awareness because dashboards are presented by applications, and applications can be securely accessed remotely. You don't need to rapidly respond to issues because in all likelihood, you those actors who are that sophisticated that you couldn't prevent them, are going to out pace and out wit you anyway. You won't need highly structured and repeatable processes for a team of Analysts to use, because your incident count will be so few and far between, it's not worth the ROI or resources to build all that pre-planning. You can wing-it each time with the right seasoned people. Finally, you don't need that structure, because having a physical room to operate in 1) wastes corporate real-estate, and 2) limits you to the talent readily available in your immediate area. In this industry, where we have 0% unemployment and virtual connectedness through solutions like Zoom, chat, and others, placing physical boundaries around your security team simply hinders your ability to capture and retain the talent you need to be successful.

Still convinced you need a SOC? I'm not, and again, that's saying a lot given that 13 years of my career was dependent upon selling SOC services.