This study aims to predict osteoporosis using machine learning models and analyzing data collected from 1958 individuals. Initially, the data were preprocessed, and various machine learning models were utilized for predicting osteoporosis. The random forest model demonstrated the best performance with an accuracy of 83.7%. Additionally, a correlation matrix between features was calculated and analyzed to identify key factors influencing osteoporosis. The findings revealed that age has the most significant impact on osteoporosis, and previous fractures also show a positive correlation with osteoporosis, albeit a relatively low one.
Osteoporosis is a metabolic bone disease characterized by decreased bone density and an increased risk of fractures. Identifying risk factors and key features that influence osteoporosis can be instrumental in preventing and managing this disease. This study aims to predict osteoporosis and identify the key influencing features using machine learning models.
The dataset comprises information from 1958 individuals with features such as age, gender, hormonal changes, family history, race/ethnicity, body weight, calcium and vitamin D intake, physical activity, smoking, alcohol consumption, medical conditions, medications, prior fractures, and osteoporosis status.
The data were loaded, cleaned, and preprocessed. Categorical features were converted to numerical values, and numerical data were standardized. The data were then split into training and test sets. The preprocessing steps included:
Various models, including logistic regression, random forest, and support vector machine (SVM), were used to predict osteoporosis. The models were trained using the training data and evaluated using the test data. The random forest model showed the best performance with an accuracy of 83.7%.
Logistic Regression:
Support Vector Machine (SVM):
Random Forest:
The correlation matrix between features was calculated and analyzed. Key findings include:
The study revealed that age is the most significant factor influencing osteoporosis. Additionally, prior fractures and medical conditions show a positive correlation with osteoporosis. Adequate calcium intake significantly reduces the risk of osteoporosis. These findings can be valuable in better prevention and management of osteoporosis.
The random forest model, with an accuracy of 83.7%, outperformed the logistic regression and SVM models. This model achieved a good balance between accuracy and efficiency, identifying key features such as age, prior fractures, and medical conditions.
Conclusion
This study employed data analysis and machine learning models to identify and analyze the correlation between various features and osteoporosis. The results indicated that age, prior fractures, and medical conditions are the most significant factors influencing osteoporosis. These findings can aid in developing strategies for better prevention and management of the disease.
References
Osteoporosis Risk Prediction.ipynb