TY - JOUR
T1 - Robust Malware identification via deep temporal convolutional network with symmetric cross entropy learning
AU - Sun, Jiankun
AU - Luo, Xiong
AU - Wang, Weiping
AU - Gao, Yang
AU - Zhao, Wenbing
PY - 2023/8/1
Y1 - 2023/8/1
N2 - Recent developments in the field of Internet of things (IoT) have aroused growing attention to the security of smart devices. Specifically, there is an increasing number of malicious software (Malware) on IoT systems. Nowadays, researchers have made many efforts concerning supervised machine learning methods to identify malicious attacks. High-quality labels are of great importance for supervised machine learning, but noises widely exist due to the non-deterministic production environment. Therefore, learning from noisy labels is significant for machine learning-enabled Malware identification. In this study, motivated by the symmetric cross entropy with satisfactory noise robustness, the authors propose a robust Malware identification method using temporal convolutional network (TCN). Moreover, word embedding techniques are generally utilised to understand the contextual relationship between the input operation code (opcode) and application programming interface function names. Here, considering the numerous unlabelled samples in real-world intelligent environments, the authors pre-train the TCN model on an unlabelled set using a word embedding method, that is, Word2Vec. In the experiments, the proposed method is compared with several traditional statistical methods and more recent neural networks on a synthetic Malware dataset and a real-world dataset. The performance comparisons demonstrate the better performance and noise robustness of their proposed method, especially that the proposed method can yield the best identification accuracy of 98.75% in real-world scenarios.
AB - Recent developments in the field of Internet of things (IoT) have aroused growing attention to the security of smart devices. Specifically, there is an increasing number of malicious software (Malware) on IoT systems. Nowadays, researchers have made many efforts concerning supervised machine learning methods to identify malicious attacks. High-quality labels are of great importance for supervised machine learning, but noises widely exist due to the non-deterministic production environment. Therefore, learning from noisy labels is significant for machine learning-enabled Malware identification. In this study, motivated by the symmetric cross entropy with satisfactory noise robustness, the authors propose a robust Malware identification method using temporal convolutional network (TCN). Moreover, word embedding techniques are generally utilised to understand the contextual relationship between the input operation code (opcode) and application programming interface function names. Here, considering the numerous unlabelled samples in real-world intelligent environments, the authors pre-train the TCN model on an unlabelled set using a word embedding method, that is, Word2Vec. In the experiments, the proposed method is compared with several traditional statistical methods and more recent neural networks on a synthetic Malware dataset and a real-world dataset. The performance comparisons demonstrate the better performance and noise robustness of their proposed method, especially that the proposed method can yield the best identification accuracy of 98.75% in real-world scenarios.
KW - learning (artificial intelligence)
KW - security of data
KW - systems analysis
UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85164593146&origin=inward
UR - https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85164593146&origin=inward
U2 - 10.1049/sfw2.12137
DO - 10.1049/sfw2.12137
M3 - Article
SN - 1751-8806
VL - 17
SP - 392
EP - 404
JO - IET Software
JF - IET Software
IS - 4
ER -