|
Chapter 6: Building Taxonomies continued
What Is a Taxonomy?What comes to mind when you think of the word taxonomy? Is it an ordered list or hierarchy of terms, possibly on the model of botanical names: kingdom, phylum, class, order, family, genus, species? Ironically, there is no agreed-upon definition for the term taxonomy or for the elements that compose it. For the purposes of this chapter, we will use the term taxonomy inclusively to refer to any classified collection of elements.Although the art of taxonomy and the resulting forms of taxonomic structures are rooted in the works of Aristotle, Linnaeus, and Darwin, the meaning of the term taxonomy has been expanded to cover new purposes. We now use taxonomies for creating metadata, or common words to describe an object, for information retrieval, categories supporting browse navigation, schemas governing Web page layout and structure, and data control lists used in support of data mining (searching thousands of data records to uncover patterns and relationships contained within the activity and history store to fulfill a reporting request). Examples of these classification systems and the resulting taxonomies vary in structure, composition, and purpose, but they are all organized according to defined principles. Many organizations struggle with how to provide access to semistructured information (information that is organizedperhaps stored in a systematic way in group files or on company servers) and unstructured information (possibly stored on personal hard disks or on local servers according to the desires or needs of individuals). Integration and access are difficult. Taxonomies provide the link between the knowledge workers and the content, or at least they facilitate that linking in the ways described in the sections that follow.
Descriptive TaxonomiesOne type of taxonomy found in the corporate environment supports information retrieval through searching. By developing and maintaining a core set of controlled vocabularies, a company can consistently label or tag its content with descriptive metadata selected from these authorized vocabularies. In addition, vocabularies can capture knowledge worker terminology and map it to a company's preferred terms. A product may have an array of different names during its lifetimefor example, N-Acetyl-p-aminophenol, Acetamidophenol, Acetaminophen, and finally the commercial form, Tylenol. A knowledge worker looking for information on a product might search by a code name, a project name, a legal name, an acronym, or a common name. Active mining of new terms and phrases from emerging content and from search query logs will help keep a descriptive taxonomy relevant to the users of that information. A taxonomy built on the thesaurus model (designating a preferred or authorized term with entry terms or variants) helps to link these different terms together. At search time, the term that the knowledge worker uses is associated with the preferred (or key) term for more precise searching, or the knowledge worker's term is expanded to include the variant forms of the term as well as the authorized term for a broader search. Taxonomies built on the thesaurus model do not force all work groups to use a common set of terminology.When used along with a search engine, query term expansion, as this synonym process is called, can reduce the amount of descriptive tagging that is required, since the tags need to contain only the term's preferred form. Used judiciously, query term expansion can improve search engine recall; that is, a larger amount of information will be gathered in response to the knowledge worker's query because the search terms are more inclusive. The point is that taxonomy of this type is linked to content and is descriptive of that content in its application. Creating this type of taxonomy involves reviewing entries against an established set of terms and looking for similarities, differences, affinities, and dependencies. As an example, think of a sales and marketing division. Employees might use the terms promotional materials and advertising materials interchangeably, at least in informal speech. But perhaps the division's formal preference is for advertising materials. A solid taxonomy would include both of these terms, because users may use either in a search. But because advertising materials is the preferred term, it can be treated as metadata and applied through tagging to pieces of content, such as Web pages, that do not use that specific wording but really are about advertising materials. If someone then searched for promotional materials, the search would be expanded to include the preferred term, and the search would succeed.
Navigational TaxonomiesA second type of taxonomy is aimed at discovering information through browsing. Once again the taxonomy provides a controlled vocabulary, but rather than using it in the background for manipulating queries, you can display this taxonomy to knowledge workers to help them find the information they need. The navigational taxonomy consists of labels applied to categories of content based on knowledge workers' mental models of how the information is organized. Web directory services such as on the Search page of Microsoft Network (MSN)(http://search.msn.com/), HomeAdvisor (http://homeadvisor.msn.com/), and the knowledge index described in Chapter 7, "Capturing Your Organization's Knowledge Assets," are all examples of navigational taxonomies.A navigational taxonomy is based on user behavior and not on content. As a result, the category labels may be organized differently from the concept-based descriptive taxonomy, and they also may contain words or phrases that would not meet the standards of a descriptive taxonomy. As an example, you might use a phrase like Sell Your House to label a set of content on the Home Advisor service on MSN. The phrase is commonly understood, but it is not concise enough for a descriptive taxonomy. How to develop a navigational taxonomy will be discussed below, but the point to remember is that a navigational taxonomy is different from a descriptive taxonomyas we will discuss below, its role is different, it can have different rules, and the sources used to build it vary. Creating this type of taxonomy involves determining proper information groupings for the content. These categories are managed by a business owner who is familiar with the users of that site. Let us consider human resources information as an example. We know that dental benefits is a type of benefit. We would make benefits the grouping and place dental benefits as a subset of that group. When the navigational taxonomy is displayed to the user, dental benefits appears hierarchically below benefits, which shows the user that the company includes dental benefits as part of the benefit package. Another fundamental difference between descriptive and navigational taxonomies is that navigational taxonomies are often specialized and unique to an instance of information presentation (a portal, a site, an intranet), and multiple content management systems do not typically reuse them as they would a descriptive taxonomy. Navigational taxonomies are therefore not governed by the same rules about which taxonomy terms can be changed.
Data Management VocabularyA third type of taxonomy that is valuable in a business setting is the data management vocabulary. This taxonomy is a short list of authorized terms without any hierarchical structure that is used to support business transactions. For example, with a large sales force, it is most efficient if salespeople report their work using the same list of activities. They may count their contacts with companies according to a simple list of contact types (managers, decision-makers, and so on), and they may categorize the businesses they work with according to different controlled descriptors that have to do with the business's size or market. In this case, a shared taxonomy will help to support reporting needs of management and other salespeople trying to mine the information in the future. Without a shared taxonomy, a company risks developing islands of data that cannot be shared or easily utilized by the rest of the organization.Until recently, this type of taxonomy, used for data management, had been considered separate from the descriptive taxonomy, used for content management. But there are areas of overlap. For example, if your organization decides on a geographic model of your markets (such as Europe, Asia, Africa, North America, South America, and Australia), the taxonomy used for reporting data and the taxonomy for accessing content should be the same. Besides providing a consistent user experience, sharing a taxonomy for two different purposes avoids duplication of effort, thus saving time and money.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||