Isi Artikel Utama
Abstrak
Twitter is one of the biggest social media digital platforms in Indonesia. It serves a medium for readers worldwide and also can be used a means of disseminating information for everyone. However, some people in social media misuse it to spread hate speech against some certain group or communities. Because hate speech it happens everywhere, we need a system to detect hate speech. Sometimes to detect hate speech in Twitter in can be very difficult because lack of context. Needing feature for this problem can make detect hate speech become more easier. Glove is a feature expansion method combine with feature extraction using N-gram and Term Frequency Inverse Document Frequency(TF-IDF) as a method. Data from that it will processed using a hybrid deep learning that combines Convolutional Neural Networks(CNN) dan Bidirectional Long Short-Term Memory(Bi-LSTM). In this study, author obtained 69,484 data related to hate speech. From this study combine feature extraction and feature expansion method has an impact on this research. Best accuracy with all of method is CNN+Bi-LSTM Hybrid method with 91,69% accuracy on top10. Meanwhile best method for Bi-LSTM+CNN method is 91,33% accuracy on top20.
Rincian Artikel
Referensi
- Ali, R. et al. (2022) ‘Hate speech detection on Twitter using transfer learning’, Computer Speech and Language, 74(July), p. 101365. Available at: https://doi.org/10.1016/j.csl.2022.101365.
- Carracedo, À.A. and Mondéjar, R.J. (2021) ‘Profiling Hate Speech Spreaders on Twitter’, CEUR Workshop Proceedings, 2936, pp. 1801–1807.
- D’Sa, A.G., Illina, I. and Fohr, D. (2020) ‘BERT and fastText Embeddings for Automatic Detection of Toxic Speech’, Proceedings of 2020 International Multi-Conference on: Organization of Knowledge and Advanced Technologies, OCTA 2020 [Preprint]. Available at: https://doi.org/10.1109/OCTA49274.2020.9151853.
- Dewi, K.C. and Ciptayani, P.I. (2022) ‘Pemodelan Sistem Rekomendasi Cerdas Menggunakan Hybrid Deep Learning’, Jurnal Sistem Informasi dan Sains Teknologi, 4(2), pp. 1–7. Available at: https://doi.org/10.31326/sistek.v4i2.1157.
- Eka Sembodo, J., Budi Setiawan, E. and Abdurahman Baizal, Z. (2016) ‘Data Crawling Otomatis pada Twitter’, (September), pp. 11–16. Available at: https://doi.org/10.21108/indosc.2016.111.
- Fadli, H. and Hidayatullah, A. (2021) ‘Identifikasi Cyberbullying pada Media Sosial Twitter Menggunakan Metode LSTM dan BiLSTM’, Universitas Islam Indonesia (UII), 2(No. 1), pp. 1–6. Available at: https://journal.uii.ac.id/AUTOMATA/article/view/17364.
- Hasnain, M. et al. (2020) ‘Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Ranking’, IEEE Access, 8, pp. 90847–90861. Available at: https://doi.org/10.1109/ACCESS.2020.2994222.
- Isnain, A.R., Sihabuddin, A. and Suyanto, Y. (2020) ‘Bidirectional Long Short Term Memory Method and Word2vec Extraction Approach for Hate Speech Detection’, IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 14(2), p. 169. Available at: https://doi.org/10.22146/ijccs.51743.
- Khan, S. et al. (2022) ‘BiCHAT: BiLSTM with deep CNN and hierarchical attention for hate speech detection’, Journal of King Saud University - Computer and Information Sciences, 34(7), pp. 4335–4344. Available at: https://doi.org/10.1016/j.jksuci.2022.05.006.
- Kominfo (2022) ‘Indonesia Peringkat Lima Pengguna Twitter’. Available at: https://www.kominfo.go.id/content/detail/2366/indonesia-peringkat-lima-pengguna-twitter/0/sorotan_media.
- Lim, E., Setiawan, E.I. and Santoso, J. (2019) ‘Stance Classification Post Kesehatan di Media Sosial Dengan FastText Embedding dan Deep Learning’, Journal of Intelligent System and Computation, 1(2), pp. 65–73. Available at: https://doi.org/10.52985/insyst.v1i2.86.
- Oprea, S. and Magdy, W. (2020) ‘iSarcasm: A Dataset of Intended Sarcasm’, pp. 1279–1289. Available at: https://doi.org/10.18653/v1/2020.acl-main.118.
- Prabowo, C. et al. (2021) ‘Teknik Klasifikasi Pembayaran SPP Berdasarkan Tingkat Ketepatan Pembayaran’, Jurnal Data Science & Informatika, 1(1), pp. 1–5.
- Prihatini, P.M. (2016) ‘Implementasi Ekstraksi Fitur Pada Pengolahan Dokumen Berbahasa Indonesia’, Jurnal Matrix, 6(3), pp. 174–178.
- Salur, M.U. and Aydin, I. (2020) ‘A Novel Hybrid Deep Learning Model for Sentiment Classification’, IEEE Access, 8, pp. 58080–58093. Available at: https://doi.org/10.1109/ACCESS.2020.2982538.
- Septian, J.A., Fachrudin, T.M. and Nugroho, A. (2019) ‘Analisis Sentimen Pengguna Twitter Terhadap Polemik Persepakbolaan Indonesia Menggunakan Pembobotan TF-IDF dan K-Nearest Neighbor’, Journal of Intelligent System and Computation, 1(1), pp. 43–49. Available at: https://doi.org/10.52985/insyst.v1i1.36.
- Siregar, H. (2022) ‘Analisis Pemanfaatan Media Sosial Sebagai Sarana Sosialisasi Pancasila’, Pancasila: Jurnal Keindonesiaan, (1), pp. 71–82. Available at: https://doi.org/10.52738/pjk.v2i1.102.
- Sun, Y., Wang, X. and Tang, X. (2013) ‘Hybrid deep learning for face verification’, Proceedings of the IEEE International Conference on Computer Vision, pp. 1489–1496. Available at: https://doi.org/10.1109/ICCV.2013.188.
- Widayati, L.S. (2018) ‘Ujaran Kebencian: Batasan Pengertian Dan Larangannya’, Info Singkat: Kajian Singkat Terhadap Isu Aktual dan Strategis, 10(6), pp. 1–6.
- Yu, Y. et al. (2018) ‘A parallel feature expansion classification model with feature-based attention mechanism’, Proceedings of 2018 IEEE 7th Data Driven Control and Learning Systems Conference, DDCLS 2018, pp. 362–367. Available at: https://doi.org/10.1109/DDCLS.2018.8516066.
Referensi
Ali, R. et al. (2022) ‘Hate speech detection on Twitter using transfer learning’, Computer Speech and Language, 74(July), p. 101365. Available at: https://doi.org/10.1016/j.csl.2022.101365.
Carracedo, À.A. and Mondéjar, R.J. (2021) ‘Profiling Hate Speech Spreaders on Twitter’, CEUR Workshop Proceedings, 2936, pp. 1801–1807.
D’Sa, A.G., Illina, I. and Fohr, D. (2020) ‘BERT and fastText Embeddings for Automatic Detection of Toxic Speech’, Proceedings of 2020 International Multi-Conference on: Organization of Knowledge and Advanced Technologies, OCTA 2020 [Preprint]. Available at: https://doi.org/10.1109/OCTA49274.2020.9151853.
Dewi, K.C. and Ciptayani, P.I. (2022) ‘Pemodelan Sistem Rekomendasi Cerdas Menggunakan Hybrid Deep Learning’, Jurnal Sistem Informasi dan Sains Teknologi, 4(2), pp. 1–7. Available at: https://doi.org/10.31326/sistek.v4i2.1157.
Eka Sembodo, J., Budi Setiawan, E. and Abdurahman Baizal, Z. (2016) ‘Data Crawling Otomatis pada Twitter’, (September), pp. 11–16. Available at: https://doi.org/10.21108/indosc.2016.111.
Fadli, H. and Hidayatullah, A. (2021) ‘Identifikasi Cyberbullying pada Media Sosial Twitter Menggunakan Metode LSTM dan BiLSTM’, Universitas Islam Indonesia (UII), 2(No. 1), pp. 1–6. Available at: https://journal.uii.ac.id/AUTOMATA/article/view/17364.
Hasnain, M. et al. (2020) ‘Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Ranking’, IEEE Access, 8, pp. 90847–90861. Available at: https://doi.org/10.1109/ACCESS.2020.2994222.
Isnain, A.R., Sihabuddin, A. and Suyanto, Y. (2020) ‘Bidirectional Long Short Term Memory Method and Word2vec Extraction Approach for Hate Speech Detection’, IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 14(2), p. 169. Available at: https://doi.org/10.22146/ijccs.51743.
Khan, S. et al. (2022) ‘BiCHAT: BiLSTM with deep CNN and hierarchical attention for hate speech detection’, Journal of King Saud University - Computer and Information Sciences, 34(7), pp. 4335–4344. Available at: https://doi.org/10.1016/j.jksuci.2022.05.006.
Kominfo (2022) ‘Indonesia Peringkat Lima Pengguna Twitter’. Available at: https://www.kominfo.go.id/content/detail/2366/indonesia-peringkat-lima-pengguna-twitter/0/sorotan_media.
Lim, E., Setiawan, E.I. and Santoso, J. (2019) ‘Stance Classification Post Kesehatan di Media Sosial Dengan FastText Embedding dan Deep Learning’, Journal of Intelligent System and Computation, 1(2), pp. 65–73. Available at: https://doi.org/10.52985/insyst.v1i2.86.
Oprea, S. and Magdy, W. (2020) ‘iSarcasm: A Dataset of Intended Sarcasm’, pp. 1279–1289. Available at: https://doi.org/10.18653/v1/2020.acl-main.118.
Prabowo, C. et al. (2021) ‘Teknik Klasifikasi Pembayaran SPP Berdasarkan Tingkat Ketepatan Pembayaran’, Jurnal Data Science & Informatika, 1(1), pp. 1–5.
Prihatini, P.M. (2016) ‘Implementasi Ekstraksi Fitur Pada Pengolahan Dokumen Berbahasa Indonesia’, Jurnal Matrix, 6(3), pp. 174–178.
Salur, M.U. and Aydin, I. (2020) ‘A Novel Hybrid Deep Learning Model for Sentiment Classification’, IEEE Access, 8, pp. 58080–58093. Available at: https://doi.org/10.1109/ACCESS.2020.2982538.
Septian, J.A., Fachrudin, T.M. and Nugroho, A. (2019) ‘Analisis Sentimen Pengguna Twitter Terhadap Polemik Persepakbolaan Indonesia Menggunakan Pembobotan TF-IDF dan K-Nearest Neighbor’, Journal of Intelligent System and Computation, 1(1), pp. 43–49. Available at: https://doi.org/10.52985/insyst.v1i1.36.
Siregar, H. (2022) ‘Analisis Pemanfaatan Media Sosial Sebagai Sarana Sosialisasi Pancasila’, Pancasila: Jurnal Keindonesiaan, (1), pp. 71–82. Available at: https://doi.org/10.52738/pjk.v2i1.102.
Sun, Y., Wang, X. and Tang, X. (2013) ‘Hybrid deep learning for face verification’, Proceedings of the IEEE International Conference on Computer Vision, pp. 1489–1496. Available at: https://doi.org/10.1109/ICCV.2013.188.
Widayati, L.S. (2018) ‘Ujaran Kebencian: Batasan Pengertian Dan Larangannya’, Info Singkat: Kajian Singkat Terhadap Isu Aktual dan Strategis, 10(6), pp. 1–6.
Yu, Y. et al. (2018) ‘A parallel feature expansion classification model with feature-based attention mechanism’, Proceedings of 2018 IEEE 7th Data Driven Control and Learning Systems Conference, DDCLS 2018, pp. 362–367. Available at: https://doi.org/10.1109/DDCLS.2018.8516066.