Analysis of Daily Exam Test Items Developed by Teacher on the Human Motion System Material for Class XI Natural Science
Analisis butir soal merupakan proses mengkaji kualitas butir soal untuk menentukan kelayakan penggunaannya. Tujuan penelitian ini untuk menganalisis kualitas butir soal yang dikembangkan guru pada materi Sistem Gerak Manusia kelas XI IPA berdasarkan validitas teoritis dan validitas empiris. Validitas teoritis ditentukan berdasarkan hasil validasi ahli pada aspek tingkat kognitif, materi, konstruksi, dan bahasa. Validitas empiris ditentukan berdasarkan validitas, reliabilitas, tingkat kesukaran, dan daya pembeda soal. Penelitian adalah penelitian deskriptif kualitatif dan kuantitatif. Hasil penelitian menunjukkan bahwa berdasarkan aspek materi dan bahasa, soal dinyatakan valid, sedangkan berdasarkan aspek konstruksi, banyak butir soal tidak valid, serta didominasi tingkat kognitif C1, C2, dan C3. Secara empiris, butir soal tipe pilihan ganda termasuk kategori valid (25%) dan tidak valid (75%), reliabilitas rendah, serta berkategori sangat mudah (20%), mudah (15%), sedang (50%), dan sukar (15%), berdaya beda baik (35%), cukup (35%), dan buruk (30%). Butir soal tipe uraian memiliki kategori valid (60%), tidak valid (40%), reliabilitas rendah, serta termasuk kategori sangat mudah (20%), mudah (40%), dan sedang (40%), berdaya beda sangat baik (20%), baik (20%), dan cukup (40%). Dapat disimpulkan bahwa berdasarkan validitas teoritis, butir soal berkategori valid, namun didominasi tingkat kognitif C1, C2, dan C3, serta tidak valid secara empiris.
Kata Kunci: Analisis butir soal, validitas teoritis, validitas empiris, aplikasi ANATES V4.
Test item analysis is a process of assessing the quality of the item to determine its feasibility to be used. The purpose of this study was to analyze the quality of test items developed by a high school biology teacher on the Human Motion System material for Class XI Natural Science based on theoretical and the empirical validities. Their theoretical validity was determined based on the results of expert validation referring to cognitive level, material, construction, and language aspects. Their empirical validity was determined based on their validity, reliability, difficulty levels, and discrimination index. This was a descriptive research with qualitative and quantitative analysis. The results showed that based on the material and language aspects the test items were valid, whereas based on construction aspects, many test items were not valid, and dominated by the cognitive domains at the level of C1, C2, and C3. Empirically, multiple choice test items were valid (25%) and not valid (75%), less reliable, very easy (20%), easy (15%), moderately difficult (50%), and difficult (15%). Based on the discrimination index, the items got good (35%), fair (35%), and bad (30%) categories. The essay test items were valid (60%), not valid (40%), less reliable, very easy (20%), easy (20%), and moderately difficult (40%). Based on the discrimination index, the items got very good (20%), good (20%), and fair (40%) categories. It can be concluded that based on theoretical validity the test items were valid, however, they were dominated by the low level domains (C1, C2, and C3). In addition, they were not valid empirically.
Keywords: Item analysis, theoretical validity, empirical validity, ANATES V4 application.