ChatGPT has made Artificial Intelligence (AI) a widespread topic of conversation in cybersecurity circles and there is certainly much to be said on the topic. ChatGPT is an application of Natural Language Processing (NLP), a subfield of AI. As NLP capabilities advance, there are many ways NLP can be used in cybersecurity tools to enhance detection capabilities, reduce manual tasks, and improve visibility of vulnerabilities, which this article will explore in more depth.
To examine NLP’s cybersecurity applications, there are a few key terms to understand at a high-level, starting, at the very top, with AI. AI is a broad concept that refers to a machine simulating human intelligence. Researchers seek to achieve this by training machines to learn the way humans learn. An example of this is machine learning (ML), where the AI algorithm is trained to learn patterns from data. The two primary ways an AI algorithm is taught to do this are 1) supervised learning and 2) unsupervised learning*.
Supervised learning: AI algorithms that are trained on labeled data to derive patterns can be used to label new data. An example would be using historical incidents to identify new incidents.
Unsupervised learning: AI algorithms that try to derive patterns from unlabeled data. Examples include categorization (such as clustering) and anomaly detection.
The next term, deep learning (DL), is a ML technique that utilizes neural networks to derive patterns from data*. Neural networks, inspired by the design of the human brain, are layers of inputs and outputs. Each layer consists of a bunch of nodes that perform an evaluation function on an input. The output of that function is then passed onto the next layer which conducts another evaluation. This goes through however many layers until a final output is reached.* NLP is a subfield of AI that uses various ML techniques, including DL, that strives to teach a machine to analyze and understand language (semantics and syntax).
NLP’S APPLICATIONS IN CYBERSECURITY
Anomaly detection in cybersecurity tools establishes a baseline of what normal traffic looks like at an organization and then alerts on any variants. To do this, it is using an AI algorithm that is deriving patterns from the data via unsupervised learning. A challenge with this process is that the AI algorithm can generate many false positives, causing alert fatigue among analysts. Applying NLP techniques to anomaly detection could reduce the number of false positives, minimizing the need for manual review. Leveraging NLP to analyze logs, which are partially written in natural language, an Intrusion Detection System would have added data points to identify a security incident more accurately.
Endpoint detection and response tools (EDR) filter and analyze activity on the endpoint to detect and prevent a malicious threat. Its predecessor, anti-virus tools, filtered threats based on a list of known malicious signatures, which is an example of an AI algorithm that detects patterns using supervised learning. Legacy anti-virus tools could detect known threats but could not detect new types of threats. To address that gap, EDR tools today filter on additional datasets, like anomalous traffic patterns, and conduct dynamic malware analysis, which is when the tool runs suspicious activity in a controlled environment to observe behavior. While EDR tools can identify new threats, it can be time-consuming and visibility gaps may persist. NLP can further advance the EDR space by implementing fast-filtering methods, such as analyzing printable strings in the malware code, to detect indications of malware.
One concern with the emergence of tools like ChatGPT is that phishing emails will be harder to detect as malicious actors can utilize ChatGPT to write targeted phishing emails with perfect grammar. Email protection tools can adapt to this new threat by utilizing NLP to identify the types of emails malicious actors may send and unusual communication patterns. For example, a malicious email may read differently from how employees at that company would compose an email. Most email protection tools on the market today make use of NLP techniques and as that realm of AI advances so will the tools’ ability to accurately identify a threat.
NLP can also be used to identify vulnerabilities. Developers, for example, can use NLP tools, like ChatGPT, to identify vulnerabilities in their code. Additionally, NLP capabilities can augment existing vulnerability management tools by scanning specification documents. Specification documents provide an overview of the software / system / product in question, such as its purpose, a list of features, code design and use requirements. Scanning these documents can provide clues indicating where vulnerabilities may exist. A vulnerability scanning system that utilizes NLP capabilities could provide a more vivid picture of vulnerabilities on the network.
While the article covers some of the ways AI NLP capabilities can enhance an organization’s cybersecurity toolset, there are many more applications. K logix can provide guidance to organizations on how AI is changing cybersecurity and the threat landscape as well as how organizations can prepare. For more information, please contact one of our experts: email@example.com.
*There are other forms of machine-learning that fall outside of these two categories.
* It should be noted that while DL is generally categorized under ML, the distinction between the two is not always clear-cut.
*With DL, the patterns and goals can be abstract, and, thus, it is not always fully known to humans how the machine solves its problem.
Sources: https://www.youtube.com/watch?v=aircAruvnKk. https://www.youtube.com/watch?v=gQmXAKD4_3o%2C www.securityinfowatch.com/cybersecurity/article/21114214/a-brief-history-of-machine-learning-in-cybersecurity Li, Dr. Albert Zhichun, and David Barton. “A Brief History of Machine Learning in Cybersecurity.” Security Info Watch. https://doi.org/10.1007/s10207-021-00553-8 www.proofpoint.com/us/blog/engineering-insights/leveraging-ml-to-detect-ai-generated-phishing-emails www.insights.sei.cmu.edu/blog/artificial-intelligence-in-practice-securing-your-code-using-natural-language-processing/