Towards Automated and Explainable Cyber Threat Hunting Leveraging Generative AI

Doctoral Candidate Name: 
Moumita Das Purba
Program: 
Computing and Information Systems
Abstract: 

As cyber threats continue to grow in both volume and sophistication, automated and effective threat hunting has become essential for proactively detecting and responding to cyber threats. Unlike traditional defenses, an automated end-to-end threat-hunting approach involves analyzing vast amounts of unstructured data to identify actionable intelligence for timely detection and mitigation. Generative AI-driven threat hunting provides a more efficient and effective alternative due to the capability of understanding complex natural language patterns, enabling faster response times and greatly reducing human effort in identifying and analyzing threats. This dissertation aims to develop an automated end-to-end threat hunting model, harnessing the power of Large Language Models (LLMs) to enhance threat detection and response. This dissertation has three main objectives: 1) developing an approach to identify threat-related information from a large amount of unstructured text, 2) developing a model to extract actionable intelligence and explain them to gain the trust of security analysts, and 3) developing a model to generate search queries for log analysis, allowing security teams to investigate potential threats in a network.

The results of every step of the automated end-to-end threat hunting process have demonstrated the effectiveness of the approach. This dissertation achieved 94.93% precision and 88.22% recall in distinguishing between threat-related and non-threat-related real-time messages. The extraction step extracted critical threat information like IOCs, observable technical manifestations of attacks, and TTPs from threat-related messages. Additionally, by integrating a knowledge-graph based validation approach, the system ensured the accuracy of the extracted information and successfully reduced the hallucination rate from 34.6% to 1.58% and the error rate from 36.9% to 7.21%. Finally, this dissertation utilized the relational context in Kibana query generation and increased accuracy from 41.03% (without relational context) to 58.97%.

This dissertation presents several major contributions to automate the end-to-end threat-hunting process, transforming cyber threat intelligence messages into actionable Kibana queries to search logs for evidence of the attack described by the intelligence. This prototype implementation, leveraging OpenAI APIs, utilizes the robust language capabilities of Large Language Models (LLMs) to identify threat-related messages and extract actionable threat intelligence from even the most cryptic real-time threat-sharing messages. The core idea is to apply an explainable AI approach that explains the logical reasoning behind extracting threat intelligence, addressing a fundamental problem of LLM: hallucinations. This research explains the extracted intelligence in terms of specific MITRE ATT&CK TTP using a knowledge graph that includes “is-a” and “part-of” relationships, which are also extracted using a Large Language Model by OpenAI. The benefits of this explanation-based approach include significantly overcoming LLM hallucinations and gaining the trust of security analysts by providing explained results. Finally, this system leverages the explained “is a” and “part of” relationships to automate Kibana query generation for log analysis. This dissertation has demonstrated that explanation can improve LLM’s accuracy in generating Kibana queries and can be further extended by enriching the knowledge graph with additional relationships.

Defense Date and Time: 
Wednesday, November 6, 2024 - 12:00pm
Defense Location: 
https://charlotte-edu.zoom.us/j/98944936353
Committee Chair's Name: 
Dr. Bill Chu
Committee Members: 
Dr. Depeng Xu, Dr. Mohamed Shehab, Dr. Benjamin J. Radford, Dr. Mark Pizzato