We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we …
In personalized medicine, biomarkers are used to select therapies with the highest likelihood of success based on an individual patient’s biomarker/genomic profile. Two goals are to choose important biomarkers that accurately predict treatment …
Multi-dimensional data constituted by measurements along multiple axes have emerged across many scientific areas such as genomics and cancer surveillance. A common objective is to investigate the conditional dependencies among the variables along …
Most cancer research now involves one or more assays profiling various biological molecules, e.g., messenger RNA and micro RNA, in samples collected on the same individuals. The main interest with these genomic data sets lies in the identification of …
We propose a two-step method for the analysis of copy number data. We first define the partitions of genome aberrations and conditional on the partitions we introduce a semiparametric Bayesian model for the analysis of multiple samples from patients …
Tumor heterogeneity is a crucial area of cancer research wherein inter- and intra-tumor differences are investigated to assess and monitor disease development and progression, especially in cancer. The proliferation of imaging and linked genomic data …
Neuroimaging and genetic studies provide distinct and complementary information about the structural and biological aspects of a disease. Integrating the two sources of data facilitates the investigation of the links between genetic variability and …
A general method for regressing a continuous response upon large groups of diverse genetic covariates via dimension reduction is developed and exemplified. It is shown that allowing latent features derived from different covariate groups to interact …
Increased availability of multi-platform genomics data on matched samples has sparked research efforts to discover how diverse molecular features interact both within and between platforms. In addition, simultaneous measurements of genetic and …
Bayesian models are generally computed with Markov Chain Monte Carlo (MCMC) methods. The main disadvantage of MCMC methods is the large number of iterations they need to sample the posterior distributions of model parameters, especially for large …