An enterprise regularly changes its attack surface due to the introduction of new technologies with built-in vulnerabilities and the development of sophisticated attack methods that take advantage of these shortcomings. However, existing repositories that provide such links have yet to be supplemented, increasing the likelihood of undermining the risk of a specific set of attack techniques with missing information.

In addition, associations often rely on manual interpretations that can be faster than the speed of attacks and therefore ineffective in combating the ever-growing list of vulnerabilities and attack actions. Therefore, it is mandatory to develop methodologies to automatically and accurately associate vulnerabilities with all relevant attack techniques.

A new study has introduced a new AI model that seems to solve the problem. Scientists — from the Department of Energy’s Pacific Northwest National Laboratory, Purdue University, Carnegie Mellon University and Boise State University — have intertwined three large databases of information about computer vulnerabilities, weaknesses and likely attack patterns.

This new framework, the scientists dubbed “Vulnerabilities and Weakness to Common Attack Pattern Mapping (VWC-MAP),” can automatically identify all relevant attack techniques of a vulnerability via weakness based on their text descriptions, applying natural language process (NLP) techniques.

Mahantesh Halappanavar, a chief computer scientist at PNNL who led the overall effort, said: “Cyber ​​defenders are inundated with information and lines of code. What they need is interpretation and support for prioritization. Where are we vulnerable? What actions can we take?”

“If you are a cyber defender, you could be dealing with hundreds of vulnerabilities every day. You need to know how they can be exploited and what you need to do to mitigate these threats. That’s the crucial missing piece. You want to know the implications of a bug, how it can be exploited and how to stop that threat.”

The model is powered by a new two-tier classification approach, where the first tier classifies vulnerabilities for weakness and the second tier classifies weakness for attack techniques.

Halappanavar said: “If we can categorize the vulnerabilities into general categories, and we know exactly how an attack proceeds, we can neutralize threats much more efficiently.”

The new model also extends the project to a third category: attack actions.

The team’s algorithm automatically correlates flaws with appropriate attack patterns with up to 80 percent accuracy and matches vulnerabilities with appropriate weaknesses with up to 87 percent accuracy. These results are far better than what can be achieved with current instruments, but scientists caution that further extensive testing of their new techniques is needed.

In this study, scientists also presented two new automated approaches: an auto-encoder (BERT) and a sequence-to-sequence model (T5) for mapping weakness-to-attack techniques by Text-to-Text and apply link prediction techniques. The first approach used a language model to associate CVEs with CWEs and then CWEs with CAPECs through a binary link prediction approach. The second approach used sequence-to-sequence techniques to translate CWEs into CAPECs with intuitive cues for ranking the associations.

The approaches yielded very similar results, which were then validated by the team’s cybersecurity expert.

Remark: The work is open source and part is now available on GitHub. The team will release the rest of the code shortly.

Magazine reference:

  1. Siddhartha Shankar Das; Ashutosh Dutta; Sumit Purohit et al. Towards automatic mapping of vulnerabilities to attack patterns using large language models. 2022 IEEE International Symposium on Technologies for Homeland Security (HST) (2023). DOI: 10.1109/HST56032.2022.10025459