Training
Certifications
Books
Special Offers
Community




 
Unlocking Knowledge Assets
Author Susan Conway and Char Sligar
Pages 256
Disk N/A
Level All Levels
Published 02/27/2002
ISBN 9780735614635
Price $39.99
To see this book's discounted price, select a reseller below.
 

More Information

About the Book
Table of Contents
Sample Chapter
Index
Related Series
Related Books
About the Author

Support: Book & CD

Rate this book
Barnes Noble Amazon Quantum Books

 


Chapter 6: Building Taxonomies continued


Building and Maintaining a Taxonomy

Taxonomy building needs to be targeted and strategic. If your world consists of a finite set of knowledge workers, products, activities, geographies, partners, and so on, build for that set with a watchful eye on the rest of the world. Maintaining a taxonomy is an oft-overlooked requirement and an underestimated cost. In planning, you should give equal consideration to what is required to build and to maintain a taxonomy or set of taxonomies.

What Do You Have Already?

As a starting point, take an inventory of any existing taxonomies. Reusing existing taxonomies can help save time and effort. Are there already established authority lists such as market segmentations, customer types, and geographies? Does your company already have a subject or keyword list that supports the library or other cataloging or publishing efforts? How widely adopted are any existing authority lists? Are there publicly available, or available for purchase, well-formed taxonomies or thesauri that would be relevant to your business information? For example, there are well-developed taxonomies for information retrieval in the areas of law, pharmacy, engineering, and ecology. Reviewing the external offerings, as well as the existing internal sources of taxonomy, will give you a solid starting point and ideas about the taxonomy's needs for granularity and scope.

Obtaining the Information

To build and maintain a reflective, strategic, targeted taxonomy, you need to seek relevant information about your organization.

Information from Knowledge Workers

As a starting point for building a core set of search vocabularies, examine search behavior as it is reflected in query logs. Query logs show the types of information that employees seek, the terms that they use, and common misspellings. With analysis over time, you will be able to determine which terms recur either constantly (such as maps or Brittany) or in a cycle (such as taxes or annual meeting).

Another source of information on knowledge worker information-seeking is feedback or problem report logs. Terminology can be at the root of problems in accessing information—did someone not find the information simply because of the way it was labeled or categorized? Focus groups, contextual interviews, and usability reports are also good sources of terminology-related information.

A formal information needs assessment, as discussed earlier in this chapter, can also help you prioritize your taxonomy-building efforts. By surveying knowledge workers to learn the most important and also the most difficult information to find, you can narrow the target and establish success metrics. Survey questions might include the following:

  • What were you looking for?
  • What is the ideal content you would like to have found?
  • How important is this content to your job/business?

Information from Content

A well-formed taxonomy not only reflects knowledge worker needs, but it also reflects the content it organizes. There are various approaches to building the taxonomy according to content. As a starting point to determine the current use of metadata and terminology, it is valuable to undertake an audit of existing metadata tags. In Microsoft's Knowledge Network Group, metadata uncovered in a corporate-wide Web search is used to chart the use of metadata over time. In addition, a tag audit can expose a common set of metadata elements that are widely used. You might discover departments where metadata tags were in place in the documents, probably due to shared publishing tools or schemas, but were not actively used—no values had been assigned.

Content owners and stakeholders also provide a valuable, if sometimes biased, overview of content. By engaging content owners, you can draw upon their subject expertise to find out if the right information is being discovered and utilized. Because most content owners want to have their content highlighted in a search, you should assess their opinions in light of what you know about knowledge workers. On the other hand, content experts are in the best position to determine the strengths and weaknesses of content discovery in their domains.

Automated document analysis is a method for allowing content to guide taxonomy creation. Word counts and automated subject analyses can provide good information about the types of documents that are in your information store, the most common topics within those documents, emerging terminologies, and the like.

With database clustering technologies, like those found in Microsoft SQL Server 2000 Analysis Services, you can uncover patterns of term variants in query logs. For example, if the term company store is at the top of the query log list and maps is somewhere slightly lower on the log, a cluster analysis might find that there are, in fact, many instances where people searched for variant terms such as map, campus maps, and so on. By analyzing the clusters, you can determine the relative importance of the concepts, not just the search strings.

Taxonomy creation can fall anywhere along a spectrum, from being an entirely human endeavor to a fully automated project. Ideally, automated assistance can improve efficiency when handling large amounts of material. Computers are very good at finding patterns—human analysis to find the significance of those patterns can help the taxonomy grow, supporting searching as well as browsing.

Structuring the Taxonomy

Once you have determined the appropriate extent of the taxonomy effort, it is time to determine your taxonomy's structure and implementation. A taxonomy's structure can range from simple to complex. It can be a simple alphabetical listing of authorized forms of terms and phrases, also known as an authority list or flat list. It can have a more complicated structure that comes from the creation of hierarchical and associative relationships between terms. Books, online guides, consultants, and examples show options for taxonomy implementation. Both the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO) have established standards and guidelines for developing and managing thesauri2. Because these standards were publicly announced before the impact of the Internet, professional groups are working on updating or supplementing these standards to reflect today's digital reality.

Business needs and rules must guide decisions about management and scope, but these standards give you a solid starting point for basic decisions about forms of terms, display of terms, and structure. Even if you have several taxonomies and unique guidelines for each one, you should also have a core set of rules grounded in standards. When each taxonomy complies with established standards, you will improve the chances that they will interoperate with other groups' efforts both internally and externally.

You have many choices for storage and management of your taxonomy. These depend on specifics of your business:

  • Use. Is the taxonomy a list to be used in a navigation menu, a drop- down list, or a large taxonomy to support information retrieval?
  • Size. Will the completed taxonomy contain several terms or several thousand terms and relationships?
  • Complexity. Will the taxonomy be a flat list or a hierarchical structure with a number of different types of relationships linking the terms?
  • Scope of use. Is the taxonomy used only locally or is it shared over the enterprise or even internationally?

In simple cases (such as a small, slowly changing taxonomy that supports browsing on one Web site), you can handle storage and management with an XML file stored on the Web server. If you maintain taxonomies that change quickly, require shared management and change control, or are highly structured, you will need to invest in support processes and systems. Depending on resources, you can either purchase taxonomy management software or build a custom management solution that fits your specific needs.

A taxonomy management software solution is convenient because you do not have to build the solution from scratch and you can often integrate it into larger document warehousing solutions with minimal fuss. But a software solution is not always as flexible. Because you did not build it, some of your specialized needs may not be met by an off-the-shelf solution. You may have some ability to automatically generate new taxonomies, but these rarely meet all your needs. And when the solution is integrated into a larger document warehousing solution, you might be limited in how well the solution interacts with your existing systems.

If you decide not to buy a third-party solution, you can build your own from scratch. The primary advantage of developing your own is that you can design the system to evolve with your needs. But in-house development requires you to have the development resources and taxonomy experts.

Microsoft's Knowledge Network Group investigated several third-party taxonomy management tools before deciding to design and build its own for the management of descriptive and navigational taxonomies, as well as for metadata schemas. A design was constructed to reflect the project's goals and was built using Microsoft SQL Server. The design that was developed allows for an easily extensible structure (both in depth of hierarchy and in the types of relationships that could be supported). A Microsoft Visual Basic interface (later named VocabMan) was used so that the taxonomists, or information professionals with a keen understanding of taxonomic rules and usage, could manipulate and maintain the structures. Microsoft's Content Development and Delivery Group (CDDG) has since expanded and supplanted this solution with a similar database structure developed as an XML Web service with a Web-based management interface. It functions as a single management service for taxonomies in support of many of Microsoft's external and internal content management systems.

Managing the Taxonomy

At Microsoft, content management for internal content is distributed among many organizations. Taxonomy creation is also distributed. As in many large companies, with multiple product lines, there exists a growing movement within Microsoft to adopt a common set of vocabularies, a common core schema, and a common management system. This is desirable because it allows the knowledge workers to be more effective in searching across the company; it also reduces expensive, redundant efforts. The shared development of taxonomy, however, means that change control measures need to be in place to keep vocabularies in sync and to keep pace with the rapid growth of products and services. For example, if a term (or a group of terms) is going to be removed, all parts of the organization using that term need to be notified so that they can make appropriate changes to their publishing and data management tools. Likewise, when content owners suggest new terms, they are referred to a taxonomy committee or advisory board to decide how to add and structure the new elements within the shared taxonomy.

Centralization or Decentralization

Once you have made decisions about your taxonomy's scope and implementation, you still have to decide how to manage it. Keeping the management model small has practical advantages. Having a centralized, dedicated group working together to build and manage a taxonomy saves time and effort. Building vocabularies by committee is tedious and wastes energy. On the other hand, even the best group of expert taxonomists will need constant input from content owners to keep abreast of new information needs that drive a vital taxonomy. An advisory panel can represent the needs of different groups cooperating on the taxonomy.

Microsoft's model utilizes both a centralized and decentralized taxonomy. Administrators work toward sharing a core set of controlled vocabularies but also recognize that the groups they work with have different needs for specific topical coverage and management schemas. To that end, Microsoft builds out shared vocabularies but saves room for groups to manage their own local term sets.

Balancing centralized control with the reality of distributed management can be difficult, but without some centralized control there is no consistency, and the taxonomy's strength is undermined. Unlike many other shared goals or KM initiatives, the corporate taxonomy really does need to be a unique, shared effort at its core. As with industry standards, a shared taxonomy leads to interoperability—which is highly desirable in a distributed work environment. At the very least, it is important to standardize a single taxonomy approach that is supported by agreed-upon standards. It is also desirable to work toward a shared taxonomy platform for the same reasons: interoperability and elimination of redundant efforts.

Authority and Support

As mentioned earlier, controling change is vital to the ongoing success of a shared taxonomy. It is crucial to stay in sync and informed of changes. By having a built-in advisory group across the company that pushes for taxonomy changes, all corners of the organization benefit from hearing about changes in a timely way and having them reflected in their shared taxonomy. It is important to determine from the start who has a voice in making change control decisions. It is also desirable to set up in advance the rules on who is responsible for implementing changes.

In cases where the need to be stable outweighs the need to be current, a scheduled update approach to making vocabulary changes can help to moderate the impact of changes to the taxonomy. In this case, you accumulate change requests, a representative group reviews them, and you make changes to the taxonomy quarterly or according to some other appropriate schedule. This scheduled updating of the taxonomy allows the central authority to give plenty of warning to the knowledge workers. This approach makes sense in cases where taxonomy changes necessitate related work such as updated documentation. This cascade of change within an organization often requires lead time.

Another approach is to make real-time changes in taxonomy—that is, agreed- upon changes are immediately effective rather than implemented on a scheduled basis. This method requires well-established change control rules governing what kind of changes can occur without notice (adding vocabulary, updating spelling, and so on) and what kind of changes require notice or even consensus among a broader group of knowledge workers or the advisory panel (deleting terms or vocabularies, radical restructuring, and so on).

In addition to balancing the business needs of different groups that use the taxonomy, change control rules need to account for the impact of reorganizations and other changes in the company. By initially structuring who owns the business rules and change control decisions, the taxonomy updating process can continue without interruption. To withstand changes in the organization, a good set of taxonomy change control rules will also codify how to change the rules when necessary.

Ongoing Maintenance

As you allocate resources for the maintenance of the taxonomy, determine whether tagging or other content management will be centralized or distributed. If use of the taxonomy is distributed, you will need to set a training budget. If the process is centralized, you will need to hire content managers. In either case, you will need to hire taxonomists to manage the taxonomy, work through change control, and keep the contents up to date. Also, determine whether any or all of the work can be outsourced.

Microsoft outsources much of the tagging, and occasionally brings in temporary staff or consultants to help meet project deadlines, but the majority of the work on taxonomy development and maintenance is done by full-time staff.

Being Strategic

Throughout the description of planning and implementing the taxonomy, we have provided suggestions on how to keep your efforts targeted toward the biggest returns. Here is a summary:

  • Review knowledge worker behavior to find out where the needs and use are greatest
  • Analyze content to determine scope and depth
  • Reflect business priorities in your taxonomy
  • Develop and implement a scaleable solution, leaving room for future development, and allocate resources for maintenance and growth
  • Establish a metrics plan that can help you determine what is going well and what can be changed to maximize your effectiveness

Using Taxonomies to Their Fullest

The obvious strengths of corporate taxonomies have already been described; they include search support, navigation, data control/mining, schema management, and personalization/information delivery. But the value of a taxonomy does not stop there. With the infrastructure of a shared set of controlled vocabularies, you can take advantage of other benefits.

A taxonomy can become a part of content creation itself. For example, in the airplane manufacturing industry, it is critical to control vocabulary for consistency and precision in instructions and manuals. In this case the technical writers use the terminology contained in the controlled taxonomy to create the instructions and manuals. The controlled terminology is now part of the full-text content, and users of these documents can be confident that they have been applied consistently and precisely.

When taxonomy is used in support of document tagging, the taxonomy can become part of the content creation/tagging process. In a company with many content creation channels and methods, a centralized corporate taxonomy can feed directly into a content management system, into stand-alone tagging tools, or into any other method used to tag content. The benefit of having all the different methods draw upon the same centralized taxonomy is consistent tagging and normalization, along with a simplification of the tagging process.

Controlled vocabularies also provide a stable foundation for localization. Once the vagaries of one language have been reduced to describe a single concept, the ability to use this structure for translation into other languages is obvious.

Taxonomies often bring valuable by-products. For example, a corporate taxonomy will likely contain abbreviations. If the taxonomy is structured in a way that allows the retrieval of elements by their relationship type, it is a simple matter to create an resource that shows the meaning or meanings of abbreviations and acronyms. Similarly, code names, language codes, and other lists can be pulled from the centrally maintained source.

Finally, and perhaps most importantly, a taxonomy can help to ensure that knowledge workers are seeing the right, consistent information. Taxonomies can help to make obvious the authoritative sources of information, and they can help prioritize information for knowledge workers within search returns and by directing navigation.

Measuring Success

How do you know that your taxonomy is working? It can be difficult to determine which performance or satisfaction gains are directly attributable to taxonomy. Depending on how you take advantage of your taxonomy, you may try the following methods to measure success:

Relevancy testing. As mentioned earlier, if your taxonomy is used to enhance searching, relevancy testing (the precision and recall measures familiar from information retrieval studies) is a good indicator of success. If you use taxonomies to manipulate search queries and search result sets, you can improve both precision and recall and you can retrieve more relevant information.

Item reuse. Another measure of navigation and search efficiency is that authoritative information is reused. You can monitor this using corporate intranet site statistics. By noting the position of the search result selected by knowledge workers, you can determine the effectiveness of content registries and the taxonomies that support them. When knowledge workers find high-quality information in the top three search results, they seldom go deeper into the list, and the authoritative, tagged items are reused.

Usability testing. This is a valuable way to test navigational taxonomy. Are knowledge workers able to find common resources quickly? Can you reduce the time to complete a task by improving the labeling of categories? Contextual methods, where knowledge workers' search behaviors are observed in their offices, can also show qualitative gains related to both navigation and search taxonomies.

Knowledge worker satisfaction. Survey results are a reliable metric only when they can be narrowed to focus on the taxonomies. If you ask knowledge workers about a search, they may base their responses on their satisfaction (or dissatisfaction) with the search results rather than on the search functionality. Other variables, such as the user interface or reaction to change, also influence knowledge workers' satisfaction with searching and browsing.

Summary

Once you have metrics in place, you can decide how effective your taxonomy implementation is in context. At the same time, it is valuable to take a more holistic approach and review some checkpoints of a good taxonomy.

  • An effective taxonomy is extensible over time. Mergers and acquisitions will not destroy the model, nor will changes in the organization that affect the maintenance of the taxonomy.
  • Rules concerning taxonomy management focus on what is similar among knowledge workers and find a way to keep dissimilar goals from sidetracking the project.
  • A well-reasoned corporate taxonomy management system ought to be, to the greatest extent possible, independent of the systems in which the resultant taxonomies will be leveraged. This builds in flexibility for future use.
  • A vital taxonomy is tightly connected to content producers in order to keep up to date and to reflect both the organizational and the individual knowledge workers' information needs.
  • A strategic taxonomy accounts for business priorities and keeps a focus on and measures how to continually have the most beneficial effect on the evolving information environment.

Proper planning and management of a corporate taxonomy strategy should be the cornerstone of any KM effort. If it is ignored, processes for managing terms and structures evolve organically and emerge without a unifying vision. Recovering from such a situation will be an even greater challenge. By identifying the goals of your KM efforts and how taxonomies can support them and by being strategic in your approach, you can begin to build a system of structures that will grow with your enterprise and that will continue to support your KM needs.

Establish common vocabulary

Connect people to content

i)    http://www.tfpl.com/areas_of_expertise/taxonomies/_report_/taxonomy_report.html.

ii)    Guidelines for the Construction, Format and Management of Monolingual Thesaurus, ANSI Standard Z39.19-1993, ISO2788, ISO5964, BS5723, BS6723.


Previous   |  Table of Contents   |  Next



Last Updated: February 5, 2002
Top of Page