Optimization of K-Means and K-DBA in Time Series Based Clustering on Wind Speed Data in Indonesia
Data deret waktu merupakan data yang disusun berdasarkan urutan waktu untuk memantau perubahan fenomena dari waktu ke waktu. Analisis clustering menjadi pendekatan analisis data eksploratif lanjutan dari data deret waktu, di mana metode K-Means populer karena kesederhanaan dan efektivitasnya dalam analisis data berskala besar, meskipun memiliki keterbatasan pada inisialisasi centroid. Penelitian ini menggunakan pengembangan K-Means bernama K-Means++ untuk mengoptimalkan inisialisasi centroid dan K-DBA yang menggunakan Barycenter Averaging dalam menentukan centroid baru. Selain itu, diterapkan tiga metode pengukuran jarak pada K-Means, yaitu Dynamic Time Warping (DTW), Derivative Dynamic Time Warping (DDTW), dan Weighted Dynamic Time Warping (WDTW). Data yang digunakan berupa kecepatan angin dari 34 provinsi di Indonesia yang cenderung memiliki pola temporal stabil dan fluktuasi ringan sehingga hasil analisis clustering lebih representatif terhadap kondisi setiap wilayah. Hasil penelitian membentuk 6 cluster, di mana K-Means dengan jarak Weighted Dynamic Time Warping (WDTW) memberikan hasil paling optimal dengan koefisien silhouette sebesar 0,73145.
Time series data is data organized by time sequence and is used to monitor changes in phenomena over time. Clustering analysis is one of the advanced exploratory data analysis approaches. One of the most popular non-hierarchical clustering analysis methods is K-Means which is widely used due to its simplicity and effectiveness for large-scale data analysis, although it has limitations in centroid initialization. Development of K-Means++ for centroid initialization optimization and K-DBA for new centroid determination using global averaging named BarycenterAveraging. In addition, the research for K-Means used three distance measurement methods, namely Dynamic Time Warping (DTW), Derivative Dynamic Time Warping (DDTW), and Weighted Dynamic Time Warping (WDTW). The time series data that will be used in this study is wind speed data from 34 provinces in Indonesia, wind speed tends to have a stable temporal pattern and fluctuations that are not too extreme so that the results of clustering analysis can be more representative of the conditions of each region. In this study, 6 clusters were formed where the K-Means method with Weighetd Dynamic Time Warping (WDTW) distance has the most optimal cluster results with a silhouette coefficient value of 0.73145.