Harvesting, Searching, and Ranking Knowledge from the Web

  • Gerhard Weikum | Max-Planck-Institut für Informatik

There is a trend to advance the functionality of search engines to a more
expressive semantic level. This is enabled by employing large-scale information extraction
of entities and relationships from semistructured as well as natural-language Web sources.
In addition, harnessing Semantic-Web-style ontologies and reaching into Deep-Web sources
can contribute towards a grand vision of turning the Web into a comprehensive knowledge
base that can be efficiently searched with high precision.

This talk presents ongoing research at the Max-Planck Institute for Informatics
towards this objective, centered around the YAGO knowledge base and
the NAGA search engine. YAGO is a large collection of entities and relational facts that
are harvested from Wikipedia and WordNet with high accuracy and reconciled into a consistent
RDF-style “semantic” graph. NAGA provides graph-template-based search over this
data, with powerful ranking capabilities based on a statistical language model for graphs.
Advanced queries and the need for ranking approximate matches pose efficiency
and scalability challenges that are addressed by algorithmic and indexing techniques.

This is joint work with Georgiana Ifrim, Gjergji Kasneci, Maya Ramanath, and Fabian Suchanek.

Speaker Details

Gerhard Weikum is a Research Director at the Max-Planck Institute for Informatics (MPII) in Saarbruecken, Germany, where he is leading the department on databases and information systems. He is also the spokesperson of the International Max-Planck Research School (IMPRS) for Computer Science. Earlier he held positions at Saarland University in Saarbruecken, Germany, at ETH Zurich, Switzerland, at MCC in Austin, Texas, and he was a visiting senior researcher at Microsoft Research in Redmond, Washington. He received his diploma and doctoral degrees from the University of Darmstadt, Germany.