关键词:
Computer engineering
Computer science
摘要:
Rapid and accurate identification of hospitalized patients at high risk for readmission, disease, extended length of stay (LOS), mortality, etc., has the potential to improve quality of care and reduce avoidable harm and costs. However, most data driven studies for risk prediction in a healthcare setting have produced non-interpretable black boxes, which precludes them from being used effectively within the decision support systems in the hospitals. The focus of this dissertation is on developing techniques to improve the interpretability and explainability of machine learning models in the context of healthcare by incorporating domain knowledge, specifically for the task of predicting the risk of hospital readmission. Preventable hospital readmissions have been identified as one of the primary targets for reducing costs and improving healthcare delivery, and advancements in data-driven approaches for this critical task can potentially have a significant impact on the healthcare system. Three approaches have been proposed in this dissertation to predict readmission risk. In the first two approaches, we focus on incorporating domain knowledge, in the form of hierarchical taxonomies available for disease codes, to improve the interpretability of a linear readmission prediction model. Two models are proposed with accuracies that are comparable to state of art methods. However, both models produce a highly interpretable output, which allows medical experts to draw clinically relevant insights and identify key factors associated with hospital readmissions. In both models, we exploit the domain induced hierarchical structure available for the disease codes which are the features for the classification algorithm. In the first approach, a structured sparsity regularization based model is applied. In the second approach, to improve the interpretability, a novel tree-structured sparsity-inducing regularization norm is proposed. Furthermore, a quantitative evaluation metric to