Optimasi Model Algoritma Klasifikasi Data Mining Menggunakan Metode Feature Selection Untuk Prediksi Penyakit Stroke

Mitha Astriani Salwa; Malabay

doi:10.37817/tekinfo.v26i1.4677

Mitha Astriani Salwa Universitas Esa Unggul
Malabay Universitas Esa Unggul

DOI: https://doi.org/10.37817/tekinfo.v26i1.4677

Abstrak

The increasing prevalence of degenerative diseases encourages the utilization of data mining
technology as a solution to overcome various problems in the health sector. This research aims to
accurately predict stroke disease by developing classification algorithm models, namely K-Nearest
Neighbors which is optimized through feature selection methods, namely Forward Selection and
Backward Elimination. The dataset used comes from Kaggle, consisting of 5,110 data with 12
attributes: id, gender, age, hypertension, heart_disease, work_type, Residence_type,
avg_glucose_level, bmi, smoking_status, and stroke. The data analysis in this research was carried
out following the CRISP-DM framework and implemented in Jupyter Notebook within Visual
Studio Code, utilizing the Python programming language. The findings revealed that incorporating
Forward Selection and Backward Elimination did not lead to a notable improvement in model
accuracy compared to the algorithm without feature selection. However, Forward Selection
produces more optimal performance than Backward Elimination. In addition, the accuracy
obtained from all scenarios can be categorized as good classification because it reaches more than 90%. This study contributes to enhancing the application of classification algorithms in medical
prediction challenges and demonstrates that the appropriate choice of feature selection methods
can influence the performance of classification models