NIPS: Oral Session 6 – Nishant A. Mehta

From Stochastic Mixability to Fast Rates

Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution P
and returns a hypothesis f
chosen from a fixed class F
with small loss ℓ
. In the parametric setting, depending upon (ℓ,F,P)
ERM can have slow (1/n√)
or fast (1/n)
rates of convergence of the excess risk as a function of the sample size n
. There exist several results that give sufficient conditions for fast rates in terms of joint properties of ℓ
, F
, and P
, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss ℓ
(there being no role there for F
or P
). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of (ℓ,F,P)
, and in so doing provides new insight into the fast-rates phenomenon. The proof exploits an old result of Kemperman on the solution to the general moment problem. We also show a partial converse that suggests a characterization of fast rates for ERM in terms of stochastic mixability is possible.

Nishant A. Mehta
Australian National University
    • Portrait of Lori Stone

      Lori Stone