Detail Cantuman Kembali
IMBALANCED HANDLING FOR STROKE PREDICTION USING OVERSAMPLING AND COST-SENSITIVE LEARNING
Abstract Predictive analysis of stroke using
machine learning (ML) is a promising approach
for early detection and reducing the number of
stroke patients. However, the inherent class
imbalance in medical datasets poses a significant
challenge, often causing models to fail to detect
certain minority cases, such as stroke. This study
aims to evaluate and compare two popular
techniques for addressing class imbalance:
oversampling using the Synthetic Minority
Oversampling Technique (SMOTE) and costsensitive learning, within the context of stroke
prediction. Using the public Kaggle stroke
dataset, three ML algorithms (Random Forest,
Support Vector Machine, and XGBoost) were
trained and tested in three scenarios: baseline
(without balancing), SMOTE, and cost-sensitive
learning. The results show that both balancing
techniques significantly improve recall for the
minority class, particularly in the SVM model,
but at the cost of reduced precision and accuracy
across the entire model. Feature importance
analysis using SHAP identified age and
hypertension as the most important factors in
predicting stroke, consistent with previous
research findings. Despite these improvements,
this study highlights the trade-off between
sensitivity and precision, which must be
considered for practical application in medical
decision support systems. Future research
should explore hybrid approaches and validate
results on larger and more diverse datasets.
Keywords Stroke prediction, class imbalance,
SMOTE, cost-sensitive learning, machine
learning, feature importance, SHAP
017 AGU I R.1
NONE
Text Skripsi
Indonesia
UNIVERSITAS TEKNOLOGI DIGITAL INDONESIA (UTDI)
2025
Yogyakarta
LOADING LIST...
LOADING LIST...







