Machine Learning Development for Subtyping COPD

Chronic obstructive pulmonary disease (COPD) is a heterogeneous lung condition characterized by progressive loss of lung function with subsequent increasing breathlessness and worsening quality of life. This heterogeneity makes it difficult to predict health decline and develop targeted treatments for better patient care. To date, researchers have attempted to use standard machine learning methodology to identify more meaningful subtypes of COPD, but these methods often make general assumptions about the data, limiting their ability to penetrate more complex patterns in some data sets. Thus, a meaningful reclassification of COPD subtypes that could lead to more targeted therapies and interventions has been elusive. The applicant introduces a new way of looking at the COPD subtyping problem by recasting it in terms of discovering associations of individuals to disease trajectories – i.e., grouping individuals based on their similarity in response to environmental and/or disease causing variables. The machine learning methods proposed build on the most recent advances in Bayesian nonparametrics, a collection of theoretical ideas and techniques that permit very flexible data representations. In this career development proposal, the applicant hypothesizes that these machine learning methods and extensions thereof – together with data sources not previously leveraged for COPD subtyping – will produce more biologically meaningful sub-groupings of patients, leading to a better understanding of the genetic and biological underpinnings of the disease and ultimately improved patient management. Aim 1 of this application involves evaluating the utility of CT-assessed lung mass – a potentially more discriminative measure of emphysema than conventionally used measures – for defining COPD subtypes using both K-means clustering and our disease trajectory algorithm. The goal of Aim 2 is to evaluate the utility of comorbidity data for defining COPD subtypes using our trajectory clustering algorithm. Novel computed tomography based measures of muscle wasting (cachexia) and pulmonary vascular pruning will be explored to determine their efficacy in subtype determination. Additionally, we will extend and test the trajectory algorithm in order to model discrete outputs (such as physician-diagnosed comorbidities), count data (e.g. exacerbations), and time-to-event data (death). In Aim 3, the applicant will extend our trajectory clustering algorithms to directly incorporate genetic and omics data for subtype discovery. Together, the research proposed in the aims of this award will take full advantage of the comprehensive data set available through the COPDGene study. Execution of the aims in this proposal will be possible through active collaboration with Dr. Ron Kikinis, M.D., a renowned leader in the field of medical image analysis, and Dr. Ed Silverman, an internationally recognized expert in the genetic epidemiology of COPD.