Workshop

Development of Enterprise-level Ontology and Lexical Resource Solutions

TKE 2008 8th International Conference on

Terminology and Knowledge Engineering

August 21, 2008 - Copenhagen, Denmark

Workshop Theme

This workshop will be organized like a seminar. Experts from several companies will be invited to present their latest practices and challenges around envisioning, planning, creating and managing enterprise-wide terminology and ontology management systems. Discussions will follow each presentation. The scope of presentations will range from the technology used to manage these systems to the actual, practical decisions that need to be made when planning and developing such systems. Participants from several companies known to be working in this area will be personally invited to present. The goal of the workshop is to foster a collegial exchange of ideas among companies facing similar challenges in terminology and ontology management as well as to give the audience insight into current issues faced by large companies dealing with challenges of terminology and ontology management and how these companies are approaching them.

Relevance

As large companies establish terminology management systems, they often decide to expand their terminology data to include formal relationships. These conceptual systems can take on various forms and as formalized ontologies can serve a variety of functions. There has been growing interest in the Semantic Web in the past several years, resulting in standards for semantic languages such as RDF (Resource Description Framework), RDF Schema, DAML+OIL, and OWL (Web Ontology Language). Nonetheless, companies face many challenges when moving from terminology to ontology management, such as integrating existing systems, applying technologies and standards, providing ROI figures, training terminologists, etc.

 

Although approaches and challenges differ depending on the size, domain and needs of a company, there are many common issues relative to the transition from terminology to ontology management. This workshop provides a needed forum for discussion of these issues.

Schedule:

Time Sessions Slides Speakers Affiliates
9:00 - 9:15 Introduction and Overview      
9:15 - 10:00 Terminology Management at IBM HTML Kara Warburton IBM
10:00 - 10:45 From Dispersed Local Glossaries to a Global Terminology Management System HTML  Helle Katic Oracle
10:45 - 11:00 Coffee Break      
11:00 - 11:45 Overview of Ontology Standards HTML Alma Kharrat Microsoft
11:45 - 12:00 Open Discussion  
12:00 - 13:30 Lunch Break      
13:30 - 14:45 From Terminology to Ontology - Process and Skills Implications HTML Robin Lombard, Barbara Karsch Microsoft
14:45 - 15:30 Challenges for Terminology and Ontology Management HTML Susan Thomas SAP
15:30 - 15:45 Coffee Break      
15:45 - 16:20 The First Step from Terminology to Ontology with NLP (Natural Language Processing) Technologies HTML Masaki Itagaki Microsoft
16:20 - 17:00 Open Discussion      
         

Workshop Organizers :

Alma Kharrat (Microsoft, almakhar@microsoft.com)
Barbara Karsch (Microsoft, bkarsch@microsoft.com)
Masaki Itagaki (Microsoft, mitagaki@microsoft.com)
Robin Lombard (Microsoft, robinl@microsoft.com)

Presentations and Presenters:

Terminology Management at IBM

Kara Warburton will provide an overview of IBM's technologies for managing terminological and lexical data including, as time permits: the central terminology database and management software (TransLexis), the Term Extraction tool and post-processing routines, the monolingual and bilingual glossary generators, and the LanguageWare tools for managing lexical resources. She will also discuss some of the programs developed to assist in housekeeping tasks, to ensure the quality of the terminology data across the content development lifecycle. She will briefly describe how terminology is used in extended applications such as search engine optimization and content classification. She will end her presentation with a request for ideas from the audience about the practical applications of concept relations to enhance business processes, and effective classification methods.

Kara Warburton holds a Master's degree in terminology from Université Laval and has 20 years varied experience in terminology management, information development, and translation. For the past ten years, she has been the head of terminology management for IBM, driving the consolidation of terminological and lexical resources, the standardization of methodologies, and the development of terminology management tools. Kara created the LISA Terminology Special Interest Group in 2001 and chaired it until 2007. She has been the head of the Canadian delegation to ISO TC37 SC3 since 2001, and she is the project leader for ISO 30042 (TBX). Karais also a frequent conference speaker, and teaches terminology at York University.

From Dispersed Local Glossaries to a Global Terminology Management System

What are the reasons for implementing a terminology management system in an already highly automated and efficient translation environment? Which values does a terminology management system offer in this scenario, and how do we get there? Who are the customers if you go beyond the translation environment? What are the requirements? What are the perspectives?
This presentation will take you through some history and parts of the Oracle Terminology Project to answer these questions covering technology, processes, content, hurdles, and revelations.

Helle Katic is a graduate of Programming Technology (Canada), with 20+ years experience in different aspects of software development and maintenance. After an intermezzo in technical writing and translation of software, joined Oracle's Worldwide Product Translation Group (WPTG), first as a Language Specialist in 1997. Presently positioned as Senior Process Manager in WPTG’s Translation Support Team. Among other tasks and responsibilities, currently managing implementation of a Global Terminology Management System.

Overview of Ontology Standards

Ontology work takes terminology to a higher level, by identifying relationships among universals behind terms in a given domain, and modeling this domain for automatic processing. A good ontology should be reusable and compatible with other ontologies. Recently, there has been an effort to create standards and tools for ontology management. This presentation will tackle SKOS (Simple Knowledge Organization System) and the main standards created by W3C, namely standard semantic markup languages such as XML (eXtensible Markup Language), RDFS (Resource Description Framework Schema), and OWL (Web Ontology Language).

Alma Kharrat is a Terminology Researcher at Language Excellence group at Microsoft. Prior to joining Language Excellence, she worked for almost 9 years as a Computational Linguist in Microsoft Natural Language Group, developing proofing tools. She has also led a team of Computational Linguists, while developing the second version of the French grammar checker and improving the French syntax component for Microsoft Office. Alma holds a Master’s degree in translation from Université Saint-Joseph, Lebanon, and has done PhD studies in computational linguistics at Université de Montréal, Canada. Before joining Microsoft, Alma worked at a Canadian software company, Machina Sapiens, developing the Spanish grammar checker and a machine translation tool. She also held a teaching assistant position at Université de Montréal, and worked as a free-lance translator for the Canadian government. Alma serves as a member of the Program Committee for CICLing and RANLP since 2001, and occasionally for other international conferences.

From Terminology to Ontology - Process and Skills Implications

Language Excellence, the team of Microsoft terminologists, is focused on helping Microsoft’s product teams enable their customers around the world to efficiently and effectively use the company’s products and services from the viewpoint of language. This is achieved through terminology management, language quality standards and local community engagement. In the next year, we will add conceptual systems to the terminology database and look into formalizing the system. This presentation describes the transition from terminology management to ontology management as we are experiencing it at Microsoft.

We will focus on the three building blocks of our system: people, process, and tools, as well as challenges we must overcome. The skills of a terminologist are very similar to those of an ontologist, so we will discuss the additional prerequisites that a terminologist needs to have in order to create sound conceptual systems. We will examine our current processes and compare them to what we envision we’ll need when working with ontologies. As regards tools, we will take a user point-of-view rather than a technological one, e.g., what sort of user interface will the terminologist need to construct ontology. Additionally, we will describe the history of our terminology database and explain conceptual systems that have already been established. After we have laid the foundation of our work, we will present a brief survey of the problems that are visible already. In conclusion, we will summarize the next steps that we need to take to make conceptual systems explicit and formalize them to go beyond what human consumers of the database need.

 

Robin Lombard is the Terminology Research Manager in the Language Excellence group at Microsoft. Robin has been at Microsoft for almost 9 years, working as a writer, editor, and terminologist before taking her current role two years ago. Prior to Microsoft, Robin spent three years in China studying Chinese language and culture, and seven years at various universities teaching ESL and undergraduate writing. Robin holds both an MA and a PhD in Linguistics.
Barbara Inge Karsch holds a Bachelor’s equivalent (Sprachen- und Dolmetscher-Institut, Munich) and a Master’s Degree (Monterey Institute of International Studies, Monterey, CA) in translation and interpretation for German and English. At J.D. Edwards she drove the design and implementation of a terminology management system for over 80 active users. In 2003, she took a sabbatical to start PhD level research into ontology management. Her work on the project ended when the translation department was dissolved. In 2004 she was hired by Microsoft as part of an effort to establish a source terminology management team and get a terminology management system (TMS) off the ground. Her focus has been on improving processes, tools, data and skills to prepare for ontology management.

Challenges for Terminology and Ontology Management:

Controlled vocabularies, which can be said to encompass terminology, ontologies and information-retrieval thesauri, among other things, are growing in number and importance. This concurrent growth naturally raises the question of how they might be beneficially combined. This question, in turn, raises a number of challenges, the most important of which are: specifying the subject field to be covered, identifying the purposes to be met, and instituting social and business processes to support development, maintenance and quality control. A further challenge, not to be ignored, is how to combine controlled vocabularies with popular Web 2.0 technologies like social tagging.

For each identified challenge a comparison will be made between the different types of controlled vocabularies, showing how these challenges are currently being met, or not, as the case may be. Wherever possible, concrete examples from existing systems will be used for illustrative purposes. Having thrown light on the topic in this way, some suggestions for combining the different types of vocabularies will be offered for discussion.

Susan Thomas is a researcher and product manager in the SAP Research Center in Karlsruhe, Germany. She received a Master of Science in Engineering in Applied Mathematics (1979) from The Johns Hopkins University, Baltimore, Maryland. She has over 25 years of IT experience, the first 20 years of it with Digital Equipment Corporation, where she worked as a teacher, a software developer for operating systems and networking software, and, eventually as a project manager for numerous development and research projects. Her current major research interest is semantics, especially the relationship between ontologies and terminology management. Another interest is environmental and social sustainability.

The First Step from Terminology to Ontology with NLP technologies

While ontology building is highly labor intensive task, various researches have looked at ways to learn and populate ontologies from text. This presentation will introduce an unsupervised approach to identify related concepts among existing terminology data. Using Microsoft's terminology database, an experiment will analyze definition text to retrieve related concepts and acquire semantic relationships with "MindNet," a knowledge representation tool that uses a broad-coverage parser.     

Masaki Itagaki leads the team of linguistic engineers, providing NLP (natural language processing)-based solutions for terminology data management. Prior to Microsoft, he worked over 10 years on a number of language engineering systems and projects, including multilingual content management, software localization and internationalization, for several business software companies. Masaki is currently working on some research projects regarding terminology quality validation and measurement for the machine translation process.