Perbandingan Metode Support Vector Machine dengan Decision Tree untuk Analisis Sentimen pada Ulasan Google Play Aplikasi Carousell
Comparison of Support Vector Machine Methods with Decision Tree for Sentiment Analysis on Carousell Application Google Play Reviews
Di era digital saat ini, smartphone telah menjadi alat esensial dalam kehidupan sehari-hari, memungkinkan akses internet yang tidak terbatas. Kemajuan ini memfasilitasi pertumbuhan eksponensial dalam e-commerce dan online marketplace. Carousell menjadi salah satu platform paling populer di Indonesia. Berdiri sejak Mei 2012, Carousell telah menjadi sarana jual beli berbagai produk, dari barang baru hingga bekas, dan telah diunduh oleh lebih dari 10 juta pengguna dengan rating 4,7 dari 337 ribu ulasan di Google Play Store hingga Oktober 2023. Dari hal tersebut dilakukan analisis sentiment dengan menggunakan perbandingan 2 metode yaitu Support Vector Machine dan Decision Tree. Kedua metode tersebut dipilih karena keunggulannya dalam menangani tugas klasifikasi dalam bidang data mining. Analisis sentimen yang dilakukan bertujuan untuk mengklasifikasikan ulasan menjadi dua kategori yaitu positif dan negatif. Data yang digunakan dalam penelitian ini diperoleh melalui teknik scraping dari ulasan aplikasi Carousell di Google Play Store, dengan total data sebanyak 10.000 ulasan. Data tersebut kemudian diproses dan dilabeli sebelum diaplikasikan ke dalam model machine learning. Proses labelling data dilakukan berdasarkan score ulasan yaitu untuk score 4 dan 5 termasuk dalam kategori positif, sedangkan untuk score 1, 2 dan 3 termasuk dalam kategori negatif. Penelitian ini juga menerapkan metode fitur seleksi information gain. Kemudian untuk metode pembagian data digunakan 2 metode berbeda yaknik split data dan k-fold cross validation. Hasil penelitian menunjukkan bahwa metode Support Vector Machine memiliki performa yang lebih baik dibandingkan dengan Decision Tree¸ penggunaan metode pembagian data split data lebih efektif dibandingkan dengan k-fold cross validation. Sedangkan untuk penerapan kombinasi fitur seleksi information gain dengan metode Support Vector Machine tidak memiliki pengaruh dalam meningkatkan akurasi. Namun untuk penerapan kombinasi fitur seleksi information gain dengan metode Decision Tree memiliki pengaruh dalam meningkatkan akurasi. Metode Support Vector Machine tanpa kombinasi fitur seleksi information gain menggunakan split data memiliki performa yang lebih baik dibandingkan dengan Decision Tree yaitu dengan hasil akurasi 93,1%, presisi 93,42%, recall 93,1% dan f1-score 93,22%. Penerapan kombinasi fitur seleksi information gain dengan metode Decision Tree memiliki akurasi sebesar 91%, presisi 92,39%, recall 91% dan f1-score 91,49%.
Kata Kunci – Analisis sentimen, Support Vector Machine, Decision Tree, Information gain, split data, k-fold cross validation.
In today's digital era, smartphones have become an essential tool in everyday life, enabling unlimited internet access. These advances facilitate exponential growth in e-commerce and online marketplaces. Carousell is one of the most popular platforms in Indonesia. Founded in May 2012, Carousell has become a means of buying and selling various products, from new to used goods, and has been downloaded by more than 10 million users with a rating of 4.7 from 337 thousand reviews on the Google Play Store until October 2023. This has been done sentiment analysis using a comparison of 2 methods, namely Support Vector Machine and Decision Tree. These two methods were chosen because of their superiority in handling classification tasks in the field of data mining. The sentiment analysis carried out aims to classify reviews into two categories, namely positive and negative. The data used in this research was obtained through scraping techniques from reviews of the Carousell application on the Google Play Store, with a total of 10,000 reviews. The data is then processed and labeled before being applied to the machine learning model. The data labeling process is carried out based on the review scores, namely scores 4 and 5 are included in the positive category, while scores 1, 2 and 3 are included in the negative category. This research also applies the information gain feature selection method. Then, for the data sharing method, 2 different methods are used, namely split data and k-fold cross validation. The results of the research show that the Support Vector Machine method has better performance compared to Decision Tree¸ using the split data division method is more effective than k-fold cross validation. Meanwhile, applying the combination of the information gain selection feature with the Support Vector Machine method has no effect on increasing accuracy. However, the application of a combination of the information gain selection feature with the Decision Tree method has an influence in increasing accuracy. The Support Vector Machine method without a combination of information gain selection features using split data has better performance compared to Decision Tree, with results of 93.1% accuracy, 93.42% precision, 93.1% recall and 93.22% f1-score. . The application of a combination of information gain selection features with the Decision Tree method has an accuracy of 91%, precision of 92.39%, recall of 91% and f1-score of 91.49%.
Keywords – Sentiment analysis, Support Vector Machine, Decision Tree, Information gain, split data, k-fold cross validation.