OPTIMASI LIGHTGBM BERBASIS IMBALANCED LEARNING UNTUK PREDIKSI RISIKO DEPRESI REMAJA
Abstract
Class imbalance is a major challenge in teen mental-health risk prediction because the minority class is often more important than the majority class. This study optimizes LightGBM using an imbalanced-learning approach to predict teen depression risk from the public Kaggle dataset Social Media Impact on Teen Mental Health. The dataset contains 1,200 adolescent records with demographic, social-media, sleep, academic, physical-activity, stress, anxiety, addiction, and depression-label variables. Only 31 records or 2.58% belong to the positive depression-risk class; therefore, stratified validation, class weighting, and minority-sensitive metrics were used. Several algorithms were compared, including Logistic Regression with SMOTE, Balanced Random Forest, HistGradientBoosting, XGBoost, CatBoost, and LightGBM. LightGBM achieved the strongest average PR-AUC of 0.995 and ROC-AUC of 1.000, while HistGradientBoosting produced the highest average F1-score of 0.969. Permutation importance indicated that stress level, sleep duration, anxiety level, and daily social-media duration were influential variables. The proposed model is promising as an analytical screening prototype, but external validation is required before clinical or educational deployment
References
[2] M. Shahzad, "Social Media Impact on Teen Mental Health," Kaggle Dataset, n.d.
[3] P. M. Valkenburg, A. Meier, and I. Beyens, "Social media use and its impact on adolescent mental health: An umbrella review of the evidence," Current Opinion in Psychology, vol. 44, pp. 58-68, 2022, doi: 10.1016/j.copsyc.2021.08.017.
[4] A. Orben, A. K. Przybylski, S.-J. Blakemore, and R. A. Kievit, "Windows of developmental sensitivity to social media," Nature Communications, vol. 13, no. 1, art. 1649, 2022, doi: 10.1038/s41467-022-29296-3.
[5] M. Liu, K. E. Kamper-DeMarco, J. Zhang, J. Xiao, D. Dong, and P. Xue, "Time spent on social media and risk of depression in adolescents: A dose-response meta-analysis," International Journal of Environmental Research and Public Health, vol. 19, no. 9, art. 5164, 2022, doi: 10.3390/ijerph19095164.
[6] H. Shannon, K. Bush, P. J. Villeneuve, K. G. C. Hellemans, and S. Guimond, "Problematic social media use in adolescents and young adults: Systematic review and meta-analysis," JMIR Mental Health, vol. 9, no. 4, e33450, 2022, doi: 10.2196/33450.
[7] R. Plackett, J. Sheringham, and J. Dykxhoorn, "The longitudinal impact of social media use on UK adolescents' mental health: Longitudinal observational study," Journal of Medical Internet Research, vol. 25, e43213, 2023, doi: 10.2196/43213.
[8] S. Ghai, L. Fassi, F. Awadh, and A. Orben, "Lack of sample diversity in research on adolescent depression and social media use: A scoping review and meta-analysis," Clinical Psychological Science, vol. 11, no. 5, pp. 759-772, 2023, doi: 10.1177/21677026221114859.
[9] L. Fassi, K. Thomas, D. A. Parry, A. Leyland-Craggs, T. J. Ford, and A. Orben, "Social media use and internalizing symptoms in clinical and community adolescent samples: A systematic review and meta-analysis," JAMA Pediatrics, vol. 178, no. 8, pp. 814-822, 2024, doi: 10.1001/jamapediatrics.2024.2078.
[10] L. Fassi, A. M. Ferguson, A. K. Przybylski, T. J. Ford, and A. Orben, "Social media use in adolescents with and without mental health conditions," Nature Human Behaviour, vol. 9, pp. 1283-1299, 2025, doi: 10.1038/s41562-025-02134-4.
[11] J. M. Nagata, C. D. Otmar, J. Shim, P. Balasubramanian, C. M. Cheng, E. J. Li, et al., "Social media use and depressive symptoms during early adolescence," JAMA Network Open, vol. 8, no. 5, e2511704, 2025, doi: 10.1001/jamanetworkopen.2025.11704.
[12] J. Chhabra, V. Pilkington, R. Benakovic, M. J. Wilson, L. La Sala, and Z. Seidler, "Social media and youth mental health: Scoping review of platform and policy recommendations," Journal of Medical Internet Research, vol. 27, e72061, 2025, doi: 10.2196/72061.
[13] N. Agyapong-Opoku, F. Agyapong-Opoku, and A. J. Greenshaw, "Effects of social media use on youth and adolescent mental health: A scoping review of reviews," Behavioral Sciences, vol. 15, no. 5, art. 574, 2025, doi: 10.3390/bs15050574.
[14] W. Chen, K. Yang, Z. Yu, Y. Shi, and C. L. P. Chen, "A survey on imbalanced learning: Latest research, applications and future directions," Artificial Intelligence Review, vol. 57, art. 137, 2024, doi: 10.1007/s10462-024-10759-6.
[15] T. Wongvorachan, S. He, and O. Bulut, "A comparison of undersampling, oversampling, and SMOTE methods for dealing with imbalanced classification in educational data mining," Information, vol. 14, no. 1, art. 54, 2023, doi: 10.3390/info14010054.
[16] P. Thölke, G. Mantilla-Ramos, A. Abdelhedi, Y. Maschke, A. Dehgan, Y. Harel, et al., "Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data," NeuroImage, vol. 277, art. 120253, 2023, doi: 10.1016/j.neuroimage.2023.120253.
[17] L. Dube and T. Verster, "Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning models," Data Science in Finance and Economics, vol. 3, no. 4, pp. 354-379, 2023, doi: 10.3934/DSFE.2023021.
[18] Y. Li, Y. Yang, P. Song, L. Duan, and R. Ren, "An improved SMOTE algorithm for enhanced imbalanced data classification by expanding sample generation space," Scientific Reports, vol. 15, art. 23521, 2025, doi: 10.1038/s41598-025-09506-w.
[19] S. N. Almuayqil, M. Humayun, N. Z. Jhanjhi, M. F. Almufareh, and D. Javed, "Framework for improved sentiment analysis via random minority oversampling for user tweet review classification," Electronics, vol. 11, no. 19, art. 3058, 2022, doi: 10.3390/electronics11193058.
[20] C. Suhaeni and H.-S. Yong, "Mitigating class imbalance in sentiment analysis through GPT-3-generated synthetic sentences," Applied Sciences, vol. 13, no. 17, art. 9766, 2023, doi: 10.3390/app13179766.
[21] E. Erlin, Y. Desnelita, N. Nasution, L. Suryati, and F. Zoromi, "Dampak SMOTE terhadap kinerja Random Forest classifier berdasarkan data tidak seimbang," MATRIK: Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 3, pp. 677-690, 2022, doi: 10.30812/matrik.v21i3.1726.
[22] V. Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, and G. Kasneci, "Deep neural networks and tabular data: A survey," IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 6, pp. 7499-7519, 2024, doi: 10.1109/TNNLS.2022.3229161.
[23] N. Hollmann, S. Müller, K. Eggensperger, F. Hutter, et al., "Accurate predictions on small data with a tabular foundation model," Nature, vol. 637, no. 8045, pp. 319-326, 2025, doi: 10.1038/s41586-024-08328-6.





