(Computational) Linguistics and the Web: Hot research questions


September 28, 2007


Henry S. Thompson


University of Edinburgh


I’ve spent the last ten years trying to feed technologies and insights from Linguistics and Computational Linguistics into the infrastructure of the Web. In this talk I’ll give brief but intense introductions to four areas of research interest from (C)L and related disciplines which have the potential for making a real impact on the way the Web works. Dependent on who’s there, we may dive deeper into one or more of them, time permitting:

  • A novel declarative approach to fixup of broken XML/(X)HTML
  • Counter-augmented Finite-State Automata for parsing XML
  • Functional XML – Self-describing documents meet the lambda calculus
  • Identity, URIs and the (Semantic) Web

See http://www.ltg.ed.ac.uk/~ht/msr_20070928.html for an extended abstract.


Henry S. Thompson

Henry S. Thompson divides his time between the School of Informatics at the University of Edinburgh, where he is Reader in Artificial Intelligence and Cognitive Science, based in the Language Technology Group of the Human Communication Research Centre, and the World Wide Web Consortium (W3C), where he works in the XML Activity.He received his Ph.D. in Linguistics from the University of California at Berkeley in 1980. His university education was divided between Linguistics and Computer Science, in which he holds an M.Sc. His research interests have ranged widely, including natural language parsing, speech recognition, machine translation evaluation, modelling human lexical access mechanisms, the fine structure of human-human dialogue, language resource creation and architectures for linguistic annotation. His current research is focussed on the semantics of markup, XML pipelines and more generally articulating and extending the architectures of XML.He was a member of the SGML Working Group of the World Wide Web Consortium which designed XML, a major contributor to the core concepts of XSLT and W3C XML Schema and is currently a member of the XML Core, XML Schema and XML Processing Model Working Groups of the W3C. He has been elected twice to the W3C TAG (Technical Architecture Group). He is lead editor of the Structures part of the XML Schema W3C Recommendation.