Information Retrieval Test Collection for Searching Spontaneous Czech Speech

  • Pavel Ircing ,
  • Pavel Pecina ,
  • DouglasW. Oard ,
  • Jianqiang Wang ,
  • ,
  • Jan Hoidekr

10th International Conference on Text, Speech, and Dialog (TSD 2007) Pilsen, Czech Republic |

This paper describes the design of the first large-scale IR test collection built for the Czech language. The creation of this collection also happens to be very challenging, as it is based on a continuous text stream from automatic transcription of spontaneous speech and thus lacks clearly defined document boundaries. All aspects of the collection building are presented, together with some general findings of initial experiments.