Eksplorasi Teknik-Teknik Berbasis pada eXtreme Gradient Boosting (XGBoost) untuk Prediksi Risiko Default Kredit
Exploration of Techniques Based on eXtreme Gradient Boosting (XGBoost) for Credit Default Risk Prediction
Perilaku konsumtif dari masyarakat berdampak pada peningkatan aktivitas kredit. Tentu saja, peningkatan ini dapat membawa risiko yang semakin besar terhadap risiko default atau gagal bayar. Permasalahan ini krusial dan perlu diselesaikan. Di sisi lain, disiplin ilmu machine learning dapat menangani berbagai permasalahan kehidupan, termasuk di dalamnya masalah perekonomian. Untuk itu, penelitian ini memfokuskan pada pemecahan masalah kredit default dengan eXtreme Gradient Boosting (XGBoost). Lebih dari itu, penelitian meningkatkan kinerja dari model prediksi risiko default kredit dengan eksplorasi penggunaan teknik-teknik berbasis pada XGBoost. Adapun dataset yang digunakan adalah “Credit Card Default Risk” yang memiliki sampel sebanyak 45,528 dengan 18 fitur sebagai data latih dan sebanyak 11,383 sampel sebagai data uji. Model yang dibangun menggunakan teknik terbaik yaitu RandomUnderSampler, feature selection berdasarkan angka correlation dengan treshold 0.001, dan rasio test-size dari train-test-split sebagai evaluasi pada ukuran 0.2, Hasil evaluasi kinerja saat pelatihan model XGBoost menunjukkan skor AUC 0.9730, F1 Score 0.7807, Recall 0.9946, Precision 0.6426, dan Accuracy 0.9550. Setelah dilakukan Hyperparameter Tuning menggunakan library Optuna didapatkan skor AUC 0.9753, F1 Score 0.7844, Recall 0.9986, Precision 0.6458, dan Accuracy 0.9557. Ensemble learning model mampu menambah kinerja pada kombinasi model Voted(Voted(XGBoost+CatBoost)+LGBM) dengan teknik Optuna, RandomUnderSampler, feature selection berdasarkan angka correlation dengan treshold 0.001, dan test_size 0.2 dengan skor AUC 0.9763, F1 Score 0.7911, Recall 0.9986, Precision 0.6550, dan Accuracy 0.9575, kemudian setelah dilakukan re-training pada dataset uji diperoleh skor AUC 0.9933, F1 Score 0.9816, Recall 1.0000, Precision 0.9639, dan Accuracy 0.9902. Hasil penelitian ini mengindikasikan bahwa pemilihan teknik yang tepat pada XGBoost dan distribusi dataset berpengaruh terhadap kinerja model prediksi.
Kata Kunci: Prediksi, Risiko Default Kredit, XGBoost
Consumtive behavior in society has the potential to increase credit activity. Of course, this increase can bring increased risks to the risk of credit default. This is crucial and needs to be solved. On the other hand, the scientific discipline of machine learning can deal with life's problems, including those that are economic. To that end, this research focuses on credit default problem-solving with eXtreme Gradient Boosting (XGBoost). More than that, research increases the performance of credit default risk prediction model with exploration using techniques based on XGBoost. As for the dataset used is "Credit Card Default Risk", which has a sample of 45,528 with 18 features as training data and as many as 11,383 samples as test data. The model was built using the best techniques as RandomUnderSampler, feature selection based on correlation number with treshold 0.001, and test-size ratio from train-test-split as evaluation at size 0.2. Performance evaluation results during XGBoost model training showed scores of AUC 0.9730, F1-Score 0.7807, Recall 0.9946, Precision 0.6426, and Accuracy 0.9550. After performing Hyperparameter Tuning using the Optuna library giving scores of AUC 0.9753, F1 Score 0.7844, Recall 0.9986, Precision 0.6458, and Accuracy 0.9557. The ensemble learning model is able to increase performance to the combination of Voted(Voted(XGBoost+CatBoost)+LGBM) with Optuna techniques, RandomUnderSampler, feature selection based on correlation numbers with treshold 0.001, and test_size 0.2 with scores of AUC 0.9763, F1 Score 0.7911, Recall 0.9986, Precision 0.6550, and Accuracy 0.9575, then after re-training on the test dataset obtained AUC 0.9933, F1 Score 0.9816, Recall 1.0000, Precision 0.9639, and Accuracy 0.9902. The result of this research indicates that the selection of the right techniques based on XGBoost and the distribution of the dataset have an impact on the performance of the prediction model.
Keywords: Prediction, Credit Default Risk, XGBoost