Cross-lingual C*ST*RD: English Access to Hindi Information

  • Anton Leuski ,
  • ,
  • Liang Zhou ,
  • Ulrich Germann ,
  • Franz Josef Och ,
  • Eduard Hovy

ACM Transactions on Asian Language Information Processing | , Vol 2(3): pp. 245-269

PDF

We present C*ST*RD, a cross-language information delivery system that supports cross-language information retrieval, information space visualization and navigation, machine translation, and text summarization of single documents and clusters of documents. C*ST*RD was assembled and trained within 1 month, in the context of DARPA’s Surprise Language Exercise, that selected as source a heretofore unstudied language, Hindi. Given the brief time, we could not create deep Hindi capabilities for all the modules, but instead experimented with combining shallow Hindi capabilities, or even English-only modules, into one integrated system. Various possible configurations, with different tradeoffs in processing speed and ease of use, enable the rapid deployment of C*ST*RD to new languages under various conditions.