Abstract

Stochastic turn-taking models use a truncated representation of past speech activity to specify how likely a speaker is to talk at the next instant. An unanswered question in such modeling is how far back to extend the conditioning context. We study this question using Switchboard (English, telephone) and Spontal (Swedish, face-to-face) conversations. We also explore whether to trade off precision with range when moving backward in the history. We find that (1) a nearly logarithmic compression of history is optimal, for both speaker and interlocutor; (2) the absolute duration of the conditioning context is at least 7 seconds; and (3) the compression scheme generalizes remarkably well across the two different corpora.