Main Article Content
Abstract
The evolution of malware, or malicious software, has raised increasing concerns, targeting not only computers but also other devices like smartphones. Malware is no longer just monomorphic but has evolved into polymorphic, metamorphic, and oligomorphic forms. With this massive development, conventional antivirus software is becoming less effective at countering it. This is due to malware's ability to propagate itself using different fingerprint and behavioral patterns. Therefore, an intelligent machine learning-based antivirus is needed, capable of detecting malware based on behavior rather than fingerprints. This research focuses on the implementation of a machine learning model for malware detection using ensemble algorithms and feature selection to achieve optimal performance. The ensemble algorithm used is Random Forest, evaluated and compared with k-Nearest Neighbor and Decision Tree as state-of-the-art methods. To enhance classification performance in terms of processing speed, the feature selection method applied is Information Gain, with 22 features. The highest results were achieved using the Random Forest algorithm and Information Gain feature selection method, reaching a score of 99.0% for accuracy and F1-Score. By reducing the number of features, processing speed can be increased by almost fivefold.
Keywords
Article Details
References
- Abujazoh, M., Al-Darras, D., A. Hamad, N., Al-Sharaeh, S., 2023. Feature Selection for High-Dimensional Imbalanced Malware Data Using Filter and Wrapper Selection Methods, in: 2023 International Conference on Information Technology (ICIT). pp. 196–201. https://doi.org/10.1109/ICIT58056.2023.10226049
- Alenezi, M.N., Alabdulrazzaq, H.K., Alshaher, A.A., Alkharang, M.M., 2022. Evolution of Malware Threats and Techniques: a Review. Int. j. commun. netw. inf. secur. 12. https://doi.org/10.17762/ijcnis.v12i3.4723
- Aslan, Ö.A., Samet, R., 2020. A Comprehensive Review on Malware Detection Approaches. IEEE Access 8, 6249–6271. https://doi.org/10.1109/ACCESS.2019.2963724
- Battineni, G., Sagaro, G.G., Nalini, C., Amenta, F., Tayebati, S.K., 2019. Comparative Machine-Learning Approach: A Follow-Up Study on Type 2 Diabetes Predictions by Cross-Validation Methods. Machines 7, 74. https://doi.org/10.3390/machines7040074
- Dev, S., Kumar, B., Dobhal, D.C., Singh Negi, H., 2022. Performance Analysis and Prediction of Diabetes using Various Machine Learning Algorithms, in: 2022 4th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N). pp. 517–521. https://doi.org/10.1109/ICAC3N56670.2022.10074117
- Feng, P., Ma, J., Sun, C., Xu, X., Ma, Y., 2018. A Novel Dynamic Android Malware Detection System With Ensemble Learning. IEEE Access 6, 30996–31011. https://doi.org/10.1109/ACCESS.2018.2844349
- Gupta, G., Rai, A., Jha, V., 2021. Predicting the Bandwidth Requests in XG-PON System using Ensemble Learning, in: 2021 International Conference on Information and Communication Technology Convergence (ICTC). pp. 936–941. https://doi.org/10.1109/ICTC52510.2021.9620935
- Lymin, Alvin, Lhoardi, B., Siahaan, J., Dharma, A., 2023. Analysis of Classification Models for ICU Mortality Prediction using Random Forest and Neural Network. Jurnal Informatika dan Rekayasa Perangkat Lunak 5, 130–134.
- Orrù, G., Monaro, M., Conversano, C., Gemignani, A., Sartori, G., 2020. Machine Learning in Psychometrics and Psychological Research. Front. Psychol. 10, 2970. https://doi.org/10.3389/fpsyg.2019.02970
- Rafrastara, F.A., Supriyanto, C., Paramita, C., Astuti, Y.P., Ahmed, F., 2023. Performance Improvement of Random Forest Algorithm for Malware Detection on Imbalanced Dataset using Random Under-Sampling Method. Jurnal Informatika 8, 113–118.
- Singh, S.K., Dwivedi, Dr.R.K., 2020. Data Mining: Dirty Data and Data Cleaning. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3610772
- Supriyanto, C., Rafrastara, F.A., Amiral, A., Amalia, S.R., Daffa, M., Fahreza, A., 2024. Malware Detection Using K-Nearest Neighbor Algorithm and Feature Selection 8.
- Wu, L., Ping, R., Ke, L., Hai-xin, D., 2011. Behavior-based Malware Analysis and Detection, in: 2011 First International Workshop on Complexity and Data Mining. Presented at the 2011 First International Workshop on Complexity and Data Mining (IWCDM 2011), IEEE, Nanjing, Jiangsu, pp. 39–42.
- Yadav, C.S., Gupta, S., 2022. A Review on Malware Analysis for IoT and Android System. SN Computer Science 4, 118. https://doi.org/10.1007/s42979-022-01543-w
- Zebari, R., Abdulazeez, A., Zeebaree, D., Zebari, D., Saeed, J., 2020. A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction. JASTT 1, 56–70. https://doi.org/10.38094/jastt1224
References
Abujazoh, M., Al-Darras, D., A. Hamad, N., Al-Sharaeh, S., 2023. Feature Selection for High-Dimensional Imbalanced Malware Data Using Filter and Wrapper Selection Methods, in: 2023 International Conference on Information Technology (ICIT). pp. 196–201. https://doi.org/10.1109/ICIT58056.2023.10226049
Alenezi, M.N., Alabdulrazzaq, H.K., Alshaher, A.A., Alkharang, M.M., 2022. Evolution of Malware Threats and Techniques: a Review. Int. j. commun. netw. inf. secur. 12. https://doi.org/10.17762/ijcnis.v12i3.4723
Aslan, Ö.A., Samet, R., 2020. A Comprehensive Review on Malware Detection Approaches. IEEE Access 8, 6249–6271. https://doi.org/10.1109/ACCESS.2019.2963724
Battineni, G., Sagaro, G.G., Nalini, C., Amenta, F., Tayebati, S.K., 2019. Comparative Machine-Learning Approach: A Follow-Up Study on Type 2 Diabetes Predictions by Cross-Validation Methods. Machines 7, 74. https://doi.org/10.3390/machines7040074
Dev, S., Kumar, B., Dobhal, D.C., Singh Negi, H., 2022. Performance Analysis and Prediction of Diabetes using Various Machine Learning Algorithms, in: 2022 4th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N). pp. 517–521. https://doi.org/10.1109/ICAC3N56670.2022.10074117
Feng, P., Ma, J., Sun, C., Xu, X., Ma, Y., 2018. A Novel Dynamic Android Malware Detection System With Ensemble Learning. IEEE Access 6, 30996–31011. https://doi.org/10.1109/ACCESS.2018.2844349
Gupta, G., Rai, A., Jha, V., 2021. Predicting the Bandwidth Requests in XG-PON System using Ensemble Learning, in: 2021 International Conference on Information and Communication Technology Convergence (ICTC). pp. 936–941. https://doi.org/10.1109/ICTC52510.2021.9620935
Lymin, Alvin, Lhoardi, B., Siahaan, J., Dharma, A., 2023. Analysis of Classification Models for ICU Mortality Prediction using Random Forest and Neural Network. Jurnal Informatika dan Rekayasa Perangkat Lunak 5, 130–134.
Orrù, G., Monaro, M., Conversano, C., Gemignani, A., Sartori, G., 2020. Machine Learning in Psychometrics and Psychological Research. Front. Psychol. 10, 2970. https://doi.org/10.3389/fpsyg.2019.02970
Rafrastara, F.A., Supriyanto, C., Paramita, C., Astuti, Y.P., Ahmed, F., 2023. Performance Improvement of Random Forest Algorithm for Malware Detection on Imbalanced Dataset using Random Under-Sampling Method. Jurnal Informatika 8, 113–118.
Singh, S.K., Dwivedi, Dr.R.K., 2020. Data Mining: Dirty Data and Data Cleaning. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3610772
Supriyanto, C., Rafrastara, F.A., Amiral, A., Amalia, S.R., Daffa, M., Fahreza, A., 2024. Malware Detection Using K-Nearest Neighbor Algorithm and Feature Selection 8.
Wu, L., Ping, R., Ke, L., Hai-xin, D., 2011. Behavior-based Malware Analysis and Detection, in: 2011 First International Workshop on Complexity and Data Mining. Presented at the 2011 First International Workshop on Complexity and Data Mining (IWCDM 2011), IEEE, Nanjing, Jiangsu, pp. 39–42.
Yadav, C.S., Gupta, S., 2022. A Review on Malware Analysis for IoT and Android System. SN Computer Science 4, 118. https://doi.org/10.1007/s42979-022-01543-w
Zebari, R., Abdulazeez, A., Zeebaree, D., Zebari, D., Saeed, J., 2020. A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction. JASTT 1, 56–70. https://doi.org/10.38094/jastt1224