Minimax estimation in linear models with unknown design over finite alphabets

Behr, M., Munk, A.

We provide a minimax optimal estimation procedure for F and W in matrix valued linear models Y = F W + Z where the parameter matrix W and the design matrix F are unknown but the latter takes values in a known finite set. The proposed finite alphabet linear model is justified in a variety of applications, ranging from signal processing to cancer genetics. We show that this allows to separate F and W uniquely under weak identifiability conditions, a task which is not doable, in general. To this end we quantify in the noiseless case, that is, Z = 0, the perturbation range of Y in order to obtain stable recovery of F and W. Based on this, we derive an iterative Lloyd’s type estimation procedure that attains minimax estimation rates for W and F for Gaussian error matrix Z. In contrast to the least squares solution the estimation procedure can be computed efficiently and scales linearly with the total number of observations. We confirm our theoretical results in a simulation study and illustrate it with a  genetic sequencing data example.