Hate Speech Detection Using Expansion Feature Glove with CNN and  Bi-LSTM on Twitter

Muchammad Alfi Karom; Erwin Budi Setiawan

Submitted

August 10, 2024

Published

May 30, 2025

Download

pdf

Statistic

Read Counter : 22 Download : 36

Abstract

Twitter is one of the biggest social media digital platforms in Indonesia. It serves a medium for readers worldwide and also can be used a means of disseminating information for everyone. However, some people in social media misuse it to spread hate speech against some certain group or communities. Because hate speech it happens everywhere, we need a system to detect hate speech. Sometimes to detect hate speech in Twitter in can be very difficult because lack of context. Needing feature for this problem can make detect hate speech become more easier. Glove is a feature expansion method combine with feature extraction using N-gram and Term Frequency Inverse Document Frequency(TF-IDF) as a method. Data from that it will processed using a hybrid deep learning that combines Convolutional Neural Networks(CNN) dan Bidirectional Long Short-Term Memory(Bi-LSTM). In this study, author obtained 69,484 data related to hate speech. From this study combine feature extraction and feature expansion method has an impact on this research. Best accuracy with all of method is CNN+Bi-LSTM Hybrid method with 91,69% accuracy on top10. Meanwhile best method for Bi-LSTM+CNN method is 91,33% accuracy on top20.

Keywords

feature expansion Glove hate speech hybrid deep learning Twitter

References

Ali, R. et al. (2022) â€˜Hate speech detection on Twitter using transfer learningâ€™, Computer Speech and Language, 74(July), p. 101365. Available at: https://doi.org/10.1016/j.csl.2022.101365.
Carracedo, Ã€.A. and MondÃ©jar, R.J. (2021) â€˜Profiling Hate Speech Spreaders on Twitterâ€™, CEUR Workshop Proceedings, 2936, pp. 1801â€“1807.
Dâ€™Sa, A.G., Illina, I. and Fohr, D. (2020) â€˜BERT and fastText Embeddings for Automatic Detection of Toxic Speechâ€™, Proceedings of 2020 International Multi-Conference on: Organization of Knowledge and Advanced Technologies, OCTA 2020 [Preprint]. Available at: https://doi.org/10.1109/OCTA49274.2020.9151853.
Dewi, K.C. and Ciptayani, P.I. (2022) â€˜Pemodelan Sistem Rekomendasi Cerdas Menggunakan Hybrid Deep Learningâ€™, Jurnal Sistem Informasi dan Sains Teknologi, 4(2), pp. 1â€“7. Available at: https://doi.org/10.31326/sistek.v4i2.1157.
Eka Sembodo, J., Budi Setiawan, E. and Abdurahman Baizal, Z. (2016) â€˜Data Crawling Otomatis pada Twitterâ€™, (September), pp. 11â€“16. Available at: https://doi.org/10.21108/indosc.2016.111.
Fadli, H. and Hidayatullah, A. (2021) â€˜Identifikasi Cyberbullying pada Media Sosial Twitter Menggunakan Metode LSTM dan BiLSTMâ€™, Universitas Islam Indonesia (UII), 2(No. 1), pp. 1â€“6. Available at: https://journal.uii.ac.id/AUTOMATA/article/view/17364.
Hasnain, M. et al. (2020) â€˜Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Rankingâ€™, IEEE Access, 8, pp. 90847â€“90861. Available at: https://doi.org/10.1109/ACCESS.2020.2994222.
Isnain, A.R., Sihabuddin, A. and Suyanto, Y. (2020) â€˜Bidirectional Long Short Term Memory Method and Word2vec Extraction Approach for Hate Speech Detectionâ€™, IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 14(2), p. 169. Available at: https://doi.org/10.22146/ijccs.51743.
Khan, S. et al. (2022) â€˜BiCHAT: BiLSTM with deep CNN and hierarchical attention for hate speech detectionâ€™, Journal of King Saud University - Computer and Information Sciences, 34(7), pp. 4335â€“4344. Available at: https://doi.org/10.1016/j.jksuci.2022.05.006.
Kominfo (2022) â€˜Indonesia Peringkat Lima Pengguna Twitterâ€™. Available at: https://www.kominfo.go.id/content/detail/2366/indonesia-peringkat-lima-pengguna-twitter/0/sorotan_media.
Lim, E., Setiawan, E.I. and Santoso, J. (2019) â€˜Stance Classification Post Kesehatan di Media Sosial Dengan FastText Embedding dan Deep Learningâ€™, Journal of Intelligent System and Computation, 1(2), pp. 65â€“73. Available at: https://doi.org/10.52985/insyst.v1i2.86.
Oprea, S. and Magdy, W. (2020) â€˜iSarcasm: A Dataset of Intended Sarcasmâ€™, pp. 1279â€“1289. Available at: https://doi.org/10.18653/v1/2020.acl-main.118.
Prabowo, C. et al. (2021) â€˜Teknik Klasifikasi Pembayaran SPP Berdasarkan Tingkat Ketepatan Pembayaranâ€™, Jurnal Data Science & Informatika, 1(1), pp. 1â€“5.
Prihatini, P.M. (2016) â€˜Implementasi Ekstraksi Fitur Pada Pengolahan Dokumen Berbahasa Indonesiaâ€™, Jurnal Matrix, 6(3), pp. 174â€“178.
Salur, M.U. and Aydin, I. (2020) â€˜A Novel Hybrid Deep Learning Model for Sentiment Classificationâ€™, IEEE Access, 8, pp. 58080â€“58093. Available at: https://doi.org/10.1109/ACCESS.2020.2982538.
Septian, J.A., Fachrudin, T.M. and Nugroho, A. (2019) â€˜Analisis Sentimen Pengguna Twitter Terhadap Polemik Persepakbolaan Indonesia Menggunakan Pembobotan TF-IDF dan K-Nearest Neighborâ€™, Journal of Intelligent System and Computation, 1(1), pp. 43â€“49. Available at: https://doi.org/10.52985/insyst.v1i1.36.
Siregar, H. (2022) â€˜Analisis Pemanfaatan Media Sosial Sebagai Sarana Sosialisasi Pancasilaâ€™, Pancasila: Jurnal Keindonesiaan, (1), pp. 71â€“82. Available at: https://doi.org/10.52738/pjk.v2i1.102.
Sun, Y., Wang, X. and Tang, X. (2013) â€˜Hybrid deep learning for face verificationâ€™, Proceedings of the IEEE International Conference on Computer Vision, pp. 1489â€“1496. Available at: https://doi.org/10.1109/ICCV.2013.188.
Widayati, L.S. (2018) â€˜Ujaran Kebencian: Batasan Pengertian Dan Larangannyaâ€™, Info Singkat: Kajian Singkat Terhadap Isu Aktual dan Strategis, 10(6), pp. 1â€“6.
Yu, Y. et al. (2018) â€˜A parallel feature expansion classification model with feature-based attention mechanismâ€™, Proceedings of 2018 IEEE 7th Data Driven Control and Learning Systems Conference, DDCLS 2018, pp. 362â€“367. Available at: https://doi.org/10.1109/DDCLS.2018.8516066.

References

Ali, R. et al. (2022) â€˜Hate speech detection on Twitter using transfer learningâ€™, Computer Speech and Language, 74(July), p. 101365. Available at: https://doi.org/10.1016/j.csl.2022.101365.

Carracedo, Ã€.A. and MondÃ©jar, R.J. (2021) â€˜Profiling Hate Speech Spreaders on Twitterâ€™, CEUR Workshop Proceedings, 2936, pp. 1801â€“1807.

Dâ€™Sa, A.G., Illina, I. and Fohr, D. (2020) â€˜BERT and fastText Embeddings for Automatic Detection of Toxic Speechâ€™, Proceedings of 2020 International Multi-Conference on: Organization of Knowledge and Advanced Technologies, OCTA 2020 [Preprint]. Available at: https://doi.org/10.1109/OCTA49274.2020.9151853.

Dewi, K.C. and Ciptayani, P.I. (2022) â€˜Pemodelan Sistem Rekomendasi Cerdas Menggunakan Hybrid Deep Learningâ€™, Jurnal Sistem Informasi dan Sains Teknologi, 4(2), pp. 1â€“7. Available at: https://doi.org/10.31326/sistek.v4i2.1157.

Eka Sembodo, J., Budi Setiawan, E. and Abdurahman Baizal, Z. (2016) â€˜Data Crawling Otomatis pada Twitterâ€™, (September), pp. 11â€“16. Available at: https://doi.org/10.21108/indosc.2016.111.

Fadli, H. and Hidayatullah, A. (2021) â€˜Identifikasi Cyberbullying pada Media Sosial Twitter Menggunakan Metode LSTM dan BiLSTMâ€™, Universitas Islam Indonesia (UII), 2(No. 1), pp. 1â€“6. Available at: https://journal.uii.ac.id/AUTOMATA/article/view/17364.

Hasnain, M. et al. (2020) â€˜Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Rankingâ€™, IEEE Access, 8, pp. 90847â€“90861. Available at: https://doi.org/10.1109/ACCESS.2020.2994222.

Isnain, A.R., Sihabuddin, A. and Suyanto, Y. (2020) â€˜Bidirectional Long Short Term Memory Method and Word2vec Extraction Approach for Hate Speech Detectionâ€™, IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 14(2), p. 169. Available at: https://doi.org/10.22146/ijccs.51743.

Khan, S. et al. (2022) â€˜BiCHAT: BiLSTM with deep CNN and hierarchical attention for hate speech detectionâ€™, Journal of King Saud University - Computer and Information Sciences, 34(7), pp. 4335â€“4344. Available at: https://doi.org/10.1016/j.jksuci.2022.05.006.

Kominfo (2022) â€˜Indonesia Peringkat Lima Pengguna Twitterâ€™. Available at: https://www.kominfo.go.id/content/detail/2366/indonesia-peringkat-lima-pengguna-twitter/0/sorotan_media.

Lim, E., Setiawan, E.I. and Santoso, J. (2019) â€˜Stance Classification Post Kesehatan di Media Sosial Dengan FastText Embedding dan Deep Learningâ€™, Journal of Intelligent System and Computation, 1(2), pp. 65â€“73. Available at: https://doi.org/10.52985/insyst.v1i2.86.

Oprea, S. and Magdy, W. (2020) â€˜iSarcasm: A Dataset of Intended Sarcasmâ€™, pp. 1279â€“1289. Available at: https://doi.org/10.18653/v1/2020.acl-main.118.

Prabowo, C. et al. (2021) â€˜Teknik Klasifikasi Pembayaran SPP Berdasarkan Tingkat Ketepatan Pembayaranâ€™, Jurnal Data Science & Informatika, 1(1), pp. 1â€“5.

Prihatini, P.M. (2016) â€˜Implementasi Ekstraksi Fitur Pada Pengolahan Dokumen Berbahasa Indonesiaâ€™, Jurnal Matrix, 6(3), pp. 174â€“178.

Salur, M.U. and Aydin, I. (2020) â€˜A Novel Hybrid Deep Learning Model for Sentiment Classificationâ€™, IEEE Access, 8, pp. 58080â€“58093. Available at: https://doi.org/10.1109/ACCESS.2020.2982538.

Septian, J.A., Fachrudin, T.M. and Nugroho, A. (2019) â€˜Analisis Sentimen Pengguna Twitter Terhadap Polemik Persepakbolaan Indonesia Menggunakan Pembobotan TF-IDF dan K-Nearest Neighborâ€™, Journal of Intelligent System and Computation, 1(1), pp. 43â€“49. Available at: https://doi.org/10.52985/insyst.v1i1.36.

Siregar, H. (2022) â€˜Analisis Pemanfaatan Media Sosial Sebagai Sarana Sosialisasi Pancasilaâ€™, Pancasila: Jurnal Keindonesiaan, (1), pp. 71â€“82. Available at: https://doi.org/10.52738/pjk.v2i1.102.

Sun, Y., Wang, X. and Tang, X. (2013) â€˜Hybrid deep learning for face verificationâ€™, Proceedings of the IEEE International Conference on Computer Vision, pp. 1489â€“1496. Available at: https://doi.org/10.1109/ICCV.2013.188.

Widayati, L.S. (2018) â€˜Ujaran Kebencian: Batasan Pengertian Dan Larangannyaâ€™, Info Singkat: Kajian Singkat Terhadap Isu Aktual dan Strategis, 10(6), pp. 1â€“6.

Yu, Y. et al. (2018) â€˜A parallel feature expansion classification model with feature-based attention mechanismâ€™, Proceedings of 2018 IEEE 7th Data Driven Control and Learning Systems Conference, DDCLS 2018, pp. 362â€“367. Available at: https://doi.org/10.1109/DDCLS.2018.8516066.

Hate Speech Detection Using Expansion Feature Glove with CNN and Bi-LSTM on Twitter

Article Sidebar

Main Article Content

Abstract

Keywords

Article Details

References

References