关键词:
African Americans
Computer science
Machine learning
Osteoporosis
SNPs
摘要:
Osteoporosis is a debilitating disease in which an individual’s bones weaken, making bones fragile and more susceptible to fracture. While commonly found amongst postmenopausal Caucasian and Asian women based on previous studies, those of African descent (African American/Black) have largely been ignored when it comes to osteoporotic studies, especially when it comes to Genome Wide Association Studies (GWAS). From GWA studies, we gain access to single nucleotide poly-morphisms (SNPs) that may contribute to certain illnesses, such as osteoporosis. With low Bone Mineral Density (BMD) being one of the primary markers of potential osteoporosis, it is prudent that proper research is done in order to help the African American population circumvent or mitigate the worst symptoms and complications of osteoporosis. In this thesis, we implemented and applied machine learning algorithms to analyze genetic data of African American women in order to make predictions that map SNPs to BMD. Using Coefficient of Determination (R2) and Mean Squared Error (MSE) for evaluation, the machine learning techniques we utilized for this regression task are: regularized linear regression (Ridge, Lasso, and ElasticNet), gradient boosted trees (XGBoost and LightGBM), and artificial neural networks. With these models, we performed an analysis on 3 datasets, compromised of 12,600, 69,476, and 158,444 variants respectively. The first dataset, known as SNP-1, received its highest overall test R2 score of 0.227 through Lasso Regression. The second dataset, SNP-5, received its highest overall test R2 score of 0.437 through LightGBM. And lastly, the third dataset, SNP-10, received its highest overall test R2 score of 0.574 from Ridge Regression.