With the widespread use of computers and networks, cybersecurity has emerged as a crucial concern for many businesses as they fight off growing cyber threats by vulnerability exploitation. To identify and mitigate zero-day or unpatched vulnerabilities, intensive defensive measures are required, which calls for a thorough understanding of vulnerability characteristics and threat behavior from several angles. This compels enterprises to spend a considerable amount of money to safeguard their infrastructure from cyberattacks, relying on the costly, ineffective, error-prone, and slow process of experts' input. Therefore, security automation has been a solution for many business owners in the battle against the growing number of cyber threats by vulnerability exploitation.
The modern text analytics architectures have been built in novel ways for a variety of applications, assisting cybersecurity professionals in developing resilient mechanisms against threats. Utilizing such technologies can therefore be a viable approach for processing, understanding, and predicting vulnerabilities that are typically reported through unstructured text.
This dissertation utilizes deep learning, natural language processing, and Information Retrieval to build a series of models that are able to effectively and efficiently parse, assess, analyze, and mitigate the vulnerabilities based on their textual descriptions reported in Common Vulnerabilities and Exposures (CVE) format.
This research offers a cybersecurity language model, as the core component, which is then utilized for characterizing the vulnerabilities as well as retrieving the corresponding course of defense actions. As a result of this work, enterprises and cybersecurity researchers will be able to automatically process domain-specific texts, classify vulnerabilities to cybersecurity standards to obtain high-level knowledge, and retrieve the course of defense actions for the underlying threats.