Identifying Affected Third-Party Java Libraries from Textual Descriptions of Vulnerabilities and Libraries

Tianyu Chen; Lin Li; Bingjie Shan; Guangtai Liang; Ding Li; Qianxiang Wang; Tao Xie

Identifying Affected Third-Party Java Libraries from Textual Descriptions of Vulnerabilities and Libraries

Tianyu Chen ,
Lin Li ,
Bingjie Shan ,
Guangtai Liang ,
Ding Li ,
Qianxiang Wang ,
Tao Xie

TOSEM 2025 | February 2025 , Vol 34: pp. 1-27

Download BibTex

To address security vulnerabilities arising from third-party libraries, security researchers maintain databases monitoring and curating vulnerability reports. Application developers can identify libraries affected by vulnerability reports (in short, affected libraries) by directly querying the databases with their used libraries. However, the querying results of affected libraries are not reliable due to the incompleteness of vulnerability reports. Thus, current approaches model the task of identifying affected libraries as a named-entity-recognition (NER) task or an extreme multi-label learning (XML) task. These approaches suffer from highly inaccurate results in identifying affected libraries with complex and similar names, e.g., Java libraries. To address these limitations, in this article, we propose VulLibMiner, the first to identify affected libraries from textual descriptions of both vulnerabilities and libraries, together with VulLib, a Java vulnerability dataset with their affected libraries. VulLibMiner consists of a TF-IDF matcher to efficiently screen out a small set of candidate libraries and a BERT-FNN model to effectively identify affected libraries from these candidates. We evaluate VulLibMiner using four state-of-the-art/practice approaches of identifying affected libraries on both their dataset named VeraJava and our VulLib dataset. Our evaluation results show that VulLibMiner can effectively identify affected libraries with an average F1 score of 0.669 while the state-of-the-art/practice approaches achieve only 0.547.