Epistasis regulates genetic control of cardiac hypertrophy

The combinatorial effect of genetic variants is often assumed to be additive. Although genetic variation can clearly interact non-additively, methods to uncover epistatic relationships remain in their infancy. We develop low-signal signed iterative random forests to elucidate the complex genetic architecture of cardiac hypertrophy. We derive deep learning-based estimates of left ventricular mass from the Read more…

Opportunities and Challenges for AI-Based Analysis of RWD in Pharmaceutical R&D: A Practical Perspective

Real world data (RWD) has become an important tool in pharmaceutical research and development. Generated every time patients interact with the healthcare system when diagnoses are developed and medical interventions are selected, RWD are massive and in many regards typical big data. The use of artificial intelligence (AI) to analyze RWD seems an obvious choice. Read more…

Minimax estimation in linear models with unknown design over finite alphabets

We provide a minimax optimal estimation procedure for F and W in matrix valued linear models Y = F W + Z where the parameter matrix W and the design matrix F are unknown but the latter takes values in a known finite set. The proposed finite alphabet linear model is justified in a variety Read more…

Provable Boolean Interaction Recovery from Tree Ensemble obtained via Random Forests

Random Forests (RF) are at the cutting edge of supervised machine learning in terms of prediction performance, especially in genomics. Iterative Random Forests (iRF) use a tree ensemble from iteratively modified RF to obtain predictive and stable non-linear high-order Boolean interactions of features. They have shown great promise for high-order biological interaction discovery that is Read more…

Multiple Haplotype Reconstruction from Allele Frequency Data

We propose a new method that is able to accurately infer major haplotypes and their frequencies just from multiple samples of allele frequency data. Our approach seems to be the first that is able to estimate more than one haplotype given such data. Even the accuracy of experimentally obtained allele frequencies can be improved by Read more…

Learning epistatic polygenic phenotypes with Boolean interactions

Detecting epistatic drivers of human phenotypes remains a challenge. Traditional approaches use regression to sequentially test multiplicative interaction terms involving single pairs of genetic variants. For higher-order interactions and genome-wide large-scale data, this strategy is computationally intractable. Moreover, multiplicative terms used in regression modeling may not capture the form of biological interactions. Building on the Read more…

Multiscale quantile segmentation

We introduce a new methodology for analyzing serial data by quantile regression assuming that the underlying quantile function consists of constant segments. The procedure does not rely on any distributional assumption besides serial independence. It is based on a multiscale statistic, which allows to control the (finite sample) prob- ability for selecting the correct number Read more…

Testing for dependence on tree structures

Tree structures, showing hierarchical relationships and the latent structures between samples, are ubiquitous in genomic and biomedical sciences. A common question in many studies is whether there is an association between a response variable measured on each sample and the latent group structure represented by some given tree. Currently, this is addressed on an ad Read more…

Multiscale blind source separation

We provide a new methodology for statistical recovery of single linear mixtures of piecewise constant signals (sources) with unknown mixing weights and change points in a multiscale fashion. We show exact recovery within an ε-neighborhood of the mixture when the sources take only values in a known finite alphabet. Based on this we provide the Read more…

Identifiability for blind source separation of multiple finite alphabet linear mixtures

We give under weak assumptions a complete combinatorial characterization of identifiability for linear mixtures of finite alphabet sources, with unknown mixing weights and unknown source signals, but known alphabet. This is based on a detailed treatment of the case of a single linear mixture. Notably, our identifiability analysis applies also to the case of unknown Read more…