Main Article Content

Abstract

Text-based communication has become a key means of interaction across various sectors. Previous studies have applied supervised learning algorithms to emotion classification in text. These studies used different datasets, but this diversity also introduced a risk of overfitting in text-based emotion classification models. Consequently, the use of cross-validation and hyperparameter optimization is required to ensure the model’s generalization ability. The aim of this research is to compare the performance of two supervised learning algorithms—Decision Tree (DT) and Support Vector Machine (SVM)—for emotion classification on an English-language text dataset of 16,000 labeled entries (anger, fear, joy, love, sadness, surprise) sourced from Kaggle. The dataset undergoes cleaning, tokenization, stopword removal, and lemmatization, after which features are extracted using TF-IDF. Both algorithms are evaluated with K-Fold and Stratified K-Fold cross-validation, then used to compute metrics of accuracy, precision, recall, and F1-score. Classification results show that the hyperparameter-tuned DT achieved an average accuracy of 88%, while the hyperparameter-tuned SVM achieved 89%. Meanwhile, Stratified K-Fold cross-validation yielded an accuracy variance of just 0.02% for DT and 0.15% for SVM. Therefore, it can be concluded that Stratified K-Fold performs better than standard K-Fold on imbalanced datasets, and that hyperparameter-tuned SVM outperforms hyperparameter-tuned DT.

Keywords

Decision Tree Emotion Classification Support Vector Machine TF‑IDF

Article Details

References

  1. Ab Nasir, A. F., Seok Nee, E., Sern Choong, C., Shahrizan Abdul Ghani, A., Abdul Majeed, A. P. P., Adam, A., & Furqan, M. (2020). Text-based emotion prediction system using machine learning approach. IOP Conference Series: Materials Science and Engineering, 769(1). https://doi.org/10.10 88/1757-899X/769/1/012022
  2. Acheampong, F. A., Wenyu, C., & Nunoo‐Mensah, H. (2020). Text‐based emotion detection: Advances, challenges, and opportunities. Engineering Reports, 2(7). https://doi.org/10.1002/eng2.12189
  3. Agus Setiawan, H., & Yuliansyah, H. (2024). Aspect-Based Sentiment Analysis of User Reviews on the Game “Honkai: Star Rail” Using Naïve Bayes Classifier. SISTEMASI, 13(5), 1956. https://doi.org/10.32520/stmsi .v13i5.4343
  4. Arifian, A., Astuti, R., & Muhamad Basysyar, F. (2024). Analisis Sentimen Opini Supporter Pengguna Youtube terhadap Sistem Pembelian Tiket Pertandingan Persib menggunakan Metode Naïve Bayes. Jurnal Informatika Dan Rekayasa Perangkat Lunak, 6(1), 250–257. https://doi.org/10. 36499/jinrpl.v6i1.10310
  5. Ashraf, N., Khan, L., Butt, S., Chang, H.-T., Sidorov, G., & Gelbukh, A. (2022). Multi-label emotion classification of Urdu tweets. PeerJ Computer Science, 8, e896. https://doi.org/10.7717/peerj-cs.896
  6. Azam, N., Ahmad, T., & Ul Haq, N. (2021). Automatic emotion recognition in healthcare data using supervised machine learning. PeerJ Computer Science, 7, e751. https://doi.org/10.7717/peerj-cs.751
  7. Bijaksana Putra Negara, A., Muhardi, H., Sajid, F., & DrHHadari Nawawi, J. (2021). Perbandingan Algoritma Klasifikasi terhadap Emosi Tweet Berbahasa Indonesia. JEPIN (Jurnal Edukasi Dan Penelitian Informatika), 7(2). https://doi. org/10.26418/jp.v7i2
  8. Cahyaningtyas, C., Nataliani, Y., & Widiasari, I. R. (2021). Analisis Sentimen Pada Rating Aplikasi Shopee Menggunakan Metode Decision Tree Berbasis SMOTE. AITI: Jurnal Teknologi Informasi, 18(2), 173–184. https://doi.org/10.24246/aiti.v18i2.17 3- 184
  9. Chowanda, A., Sutoyo, R., Meiliana, & Tanachutiwat, S. (2021). Exploring Text-based Emotions Recognition Machine Learning Techniques on Social Media Conversation. Procedia Computer Science, 179, 821–828. https://doi.org/10.1016/j. procs.2021.01.099
  10. Chowdhary, K. R. (2020). Natural Language Processing. In Fundamentals of Artificial Intelligence (pp. 603–649). Springer India. https://doi.org/10.1007/978-81-322-3972-7_19
  11. Depari, D. H., Widiastiwi, Y., & Santoni, M. M. (2022). Perbandingan Model Decision Tree, Naive Bayes dan Random Forest untuk Prediksi Klasifikasi Penyakit Jantung. Informatik : Jurnal Ilmu Komputer, 18(3), 239. https://doi.org/10. 52958/iftk.v18i3.4694
  12. Elinda, E., Yuliansyah, H., & Latiffi, M. I. A. (2024). Sentiment Analysis of the Sheikh Zayed Grand Mosque’s Visitor Reviews on Google Maps Using the VADER Method. International Journal of Advances in Data and Information Systems, 5(1), 71–84. https://doi.org/10.59395/ijadis.v5i1.1320
  13. Fernandes, J. V. M. R., Alexandria, A. R. de, Marques, J. A. L., Assis, D. F. de, Motta, P. C., & Silva, B. R. dos S. (2024). Emotion Detection from EEG Signals Using Machine Deep Learning Models. Bioengineering, 11(8). https://doi.org/10. 3390/bioengineering11080782
  14. Gunawan, L., Anggreainy, M. S., Wihan, L., Santy, Lesmana, G. Y., & Yusuf, S. (2023). Support vector machine based emotional analysis of restaurant reviews. Procedia Computer Science, 216, 479–484. https://doi.org/10.1016/j.procs.2022.12.160
  15. Kusal, S., Patil, S., Choudrie, J., Kotecha, K., Vora, D., & Pappas, I. (2022). A Review on Text-Based Emotion Detection -- Techniques, Applications, Datasets, and Future Directions. ArXiv, 2205. https://doi.org/10.48550/arXiv.2205.03235
  16. Liu, X., Shi, T., Zhou, G., Liu, M., Yin, Z., Yin, L., & Zheng, W. (2023). Emotion classification for short texts: an improved multi-label method. Humanities and Social Sciences Communications, 10(1), 306. https://doi.org/10.1057/s41599-023-0181 6-6
  17. Machová, K., Szabóova, M., Paralič, J., & Mičko, J. (2023). Detection of emotion by text analysis using machine learning. Frontiers in Psychology, 14. https://doi.org/10.3389/fpsyg.2023.1190326
  18. Maruf, A. Al, Ziyad, Z. M., Haque, Md. M., & Khanam, F. (2022). Emotion Detection from Text and Sentiment Analysis of Ukraine Russia War using Machine Learning Technique. International Journal of Advanced Computer Science and Applications, 13(12), 2022. https://doi.org/ 10.14569/IJACSA.2022.01312101
  19. Nandwani, P., & Verma, R. (2021). A review on sentiment analysis and emotion detection from text. Social Network Analysis and Mining, 11(1), 81. https://doi.org/ 10.1007/s13278-021-00776-6
  20. Okta, B., Miranda, S., Yuliansyah, H., & Biddinika, M. K. (2024). Machine Translation Indonesian Bengkulu Malay Using Neural Machine Translation-LSTM. IJCCS (Indonesian Journal of Computing and Cybernetics Systems, 18(3), 1–5. https://doi.org/https://doi.org/10.22146/ijccs.98384
  21. Patel, N., Patel, F., & Kumar Bharti, S. (2022). Live Emotion Verifier for Chat Applications Using Emotional Intelligence. In Smart Innovation, Systems and Technologies, 267, 11–19. Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-16-6616-2_2
  22. Pramesti, L. A., & Pratiwi, N. (2023). Analisis Sentimen Twitter Terhadap Program MBKM Menggunakan Decision Tree dan Support Vector Machine. Journal of Information System Research (JOSH), 4(4), 1145–1154. https://doi.org/10. 47065/josh.v4i4.3807
  23. Rahayu, K., Fitria, V., Septhya, D., Rahmaddeni, R., & Efrizoni, L. (2023). Klasifikasi Teks untuk Mendeteksi Depresi dan Kecemasan pada Pengguna Twitter Berbasis Machine Learning. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 3(2), 108–114. https://doi.org/10. 57152/malcom.v3i2.780
  24. Rohman, A. N., Utami, E., & Raharjo, S. (2019). Deteksi Kondisi Emosi pada Media Sosial Menggunakan Pendekatan Leksikon dan Natural Language Processing. Eksplora Informatika, 9(1), 70–76. https://doi.org /10.30864/eksplora.v9i1.277
  25. Rokhman, K. A., Berlilana, B., & Arsi, P. (2021). Perbandingan Metode Support Vector Machine Dan Decision Tree Untuk Analisis Sentimen Review Komentar Pada Aplikasi Transportasi Online. Journal of Information System Management (JOISM), 3(1), 1–7. https://doi.org/10.24076/JOISM .2021v3i1.341
  26. Sinaga, H. H., & Agustian, S. (2022). Pebandingan Metode Decision Tree dan XGBoost untuk Klasifikasi Sentimen Vaksin Covid-19 di Twitter. Jurnal Nasional Teknologi Dan Sistem Informasi, 8(3), 107–114. https://doi.org/10.25077/ TEKNOSI.v8i3.2022.107-114
  27. Sondakh, D. E., Maringka, R. C., Ayorbaba, F. P., Mangi, J. S. C. B. T., & Pungus, S. R. (2023). Emotion Mining User Review of the BRImo Mobile Banking Application Using the Decision Tree Algorithm. Jurnal Sisfokom (Sistem Informasi Dan Komputer), 12(3), 350–355. https://doi.or g/10.32736/sisfokom.v12i3.1721
  28. Sontayasara, T., Jariyapongpaiboon, S., Promjun, A., Seelpipat, N., Saengtabtim, K., Tang, J., & Leelawat, N. (2021). Twitter Sentiment Analysis of Bangkok Tourism During COVID-19 Pandemic Using Support Vector Machine Algorithm. Journal of Disaster Research, 16(1), 24–30. https://doi.org/10.20965/jdr.2021 .p0024
  29. Susandri, S., Defit, S., & Tajuddin, M. (2023). Sentiment Labeling And Text Classification Machine Learning For Whatsapp Group. JITK (Jurnal Ilmu Pengetahuan Dan Teknologi Komputer), 9(1), 119–125. https://doi.org/10.33480 /jitk.v9i1.4201
  30. Syafia, A. N., Hidayattullah, M. F., & Suteddy, W. (2023). Studi Komparasi Algoritma SVM Dan Random Forest Pada Analisis Sentimen Komentar Youtube BTS. Jurnal Informatika: Jurnal Pengembangan IT, 8(3), 207–212. https://doi.org/10.30591/j pit.v8i3.5064
  31. Thomas, S., Yuliana, & Noviyanti. P. (2021). Study Analisis Metode Analisis Sentimen pada YouTube. Journal of Information Technology, 1(1), 1–7. https://doi.org/10.4 6229/jifotech.v1i1.201
  32. Wulan, P. P., & Basri, H. (2024). Analisis Sentimen Terhadap Layanan Nasabah Bank Menggunakan Teknik Klasifikasi Naive Bayes. Jurnal Kecerdasan Buatan Dan Teknologi Informasi, 3(2), 68–74. https://doi.org/10.69916/jkbti.v3i2.131
  33. Yuliansyah, H., Wahyuni Sukesi, T., Asti Mulasari, S., & Nur Syamilah Wan Ali, W. (2023). Bulletin of Social Informatics Theory and Application Artificial intelligence in malnutrition research: a bibliometric analysis. Bulletin of Social Informatics Theory and Application, 7(1), 32–42. https://doi.org/10.31763/businta. 73i1.605