Microsoft SQL Server 2000 Analysis Services Performance Guide

Published: June 1, 2003

Summary: This paper describes techniques you can use to optimize query responsiveness and processing performance in Microsoft® SQL Server™ 2000 Analysis Services.

*
On This Page
IntroductionIntroduction
Optimizing Analysis Services to Improve Query ResponsivenessOptimizing Analysis Services to Improve Query Responsiveness
Optimizing Analysis Services to Improve Processing PerformanceOptimizing Analysis Services to Improve Processing Performance
Optimizing the Data Model to Improve PerformanceOptimizing the Data Model to Improve Performance
Optimizing Hardware Resources to Improve PerformanceOptimizing Hardware Resources to Improve Performance
Optimizing the Architecture to Improve PerformanceOptimizing the Architecture to Improve Performance
Optimizing Clients for Slow Network ConnectionsOptimizing Clients for Slow Network Connections
SummarySummary
Appendix A: ResourcesAppendix A: Resources
Appendix B: Interpreting Performance Monitor CountersAppendix B: Interpreting Performance Monitor Counters
Appendix C: How to Flush the Query Results CacheAppendix C: How to Flush the Query Results Cache
Appendix D: How to Tune the Process Buffer SizeAppendix D: How to Tune the Process Buffer Size
Appendix E: A Checklist of Configuration Steps for an Analysis ServerAppendix E: A Checklist of Configuration Steps for an Analysis Server
Appendix F: When to Consider Moving to SQL Server 2000 (64-bit)Appendix F: When to Consider Moving to SQL Server 2000 (64-bit)

Introduction

Users can analyze data stored in Microsoft® SQL Server™ Analysis Services with greater speed and flexibility than users can query the same data stored in a relational database. Analysis Services queries data more efficiently because it organizes data into multidimensional structures, stores aggregations of fact data, and stores frequently used results in a memory cache for quick access. While many sites deploy Analysis Services with little or no performance tuning and receive good query responsiveness and processing performance, there are many techniques that you can employ to increase the speed with which Analysis Services processes new data into multidimensional cubes and to reduce the length of time Analysis Services requires to respond to well-formed user queries.

This paper addresses these techniques, which are based on the collective experience of the Microsoft Business Intelligence (BI) Practices team and the Analysis Services development team. Sites with large quantities of data, many users, or complex query patterns can benefit the most from employing the techniques discussed in this paper.

This paper is divided into the following sections:

"Optimizing Analysis Services to Improve Query Responsiveness" discusses techniques that you can employ within Analysis Services to increase the speed with which Analysis Services responds to user queries.

"Optimizing Analysis Services to Improve Processing Performance" discusses techniques that you can employ to increase the speed with which new data is processed into Analysis Services, and to minimize the time during which cubes are unavailable to users due to the processing of new data. The impact of techniques used to improve query responsiveness on processing performance is also discussed.

"Optimizing the Data Model to Improve Performance" discusses design decisions you can make to improve query responsiveness and processing performance.

"Optimizing Hardware Resources to Improve Performance" discusses how additional system resources and the appropriate configuration of existing system resources can improve query responsiveness and processing performance.

"Optimizing the Architecture to Improve Performance" discusses techniques that you can employ at the architecture level to improve query responsiveness and processing performance.

"Optimizing Clients for Slow Network Connections" discusses techniques that you can employ on the client side to increase query responsiveness over a slow network connection.

This paper does not address performance issues related to the efficiency of the MDX queries actually submitted to Analysis Services from clients. However, the MDX used in queries and calculated members can have a significant impact on the responsiveness of Analysis Services to user queries. Therefore this should not be overlooked when attempting to improve query responsiveness. For information on MDX resources, see Appendix A, "Resources," in this paper.

In addition to the techniques discussed in this paper, you should also apply SQL Server 2000 Service Pack 3 (SP3) to the computer on which Analysis Services is running (the Analysis server) and update the Pivot Table Service (PTS) on each Analysis Services client computer. Updating PTS on each client computer is particularly important because the client-server architecture of PTS places a significant portion of the PTS code on each client computer and SP3 includes significant performance enhancements to the client-side components of PTS. Specific SP3 performance enhancements are discussed in context throughout this white paper.

Top of pageTop of page

Optimizing Analysis Services to Improve Query Responsiveness

Understanding the steps involved in resolving a client query will help you improve query responsiveness. When a client application submits a query to Analysis Services, the query is resolved in the following order:

1.

PTS on the client parses the query and, if the query references multiple data slices, breaks the query into separate query requests. PTS breaks the query into one query request per slice when there are multiple slices on the same dimension.

2.

For each query request, PTS reads dimension information stored in memory to determine the levels and members needed to resolve the query request. If additional dimension information is required, PTS requests it from the server.

3.

PTS determines whether the query request can be resolved using information contained in the client's cache. Query requests that cannot be resolved from the client's memory are forwarded to Analysis Services for resolution.

4.

Analysis Services attempts to resolve the query request from the data stored in its query results cache.

5.

For the portion of each query request that cannot be resolved from the query results cache, Analysis Services retrieves the data from the appropriate cube partitions (using aggregations when they are useful). With small cubes, this data might be in the file system cache; with large cubes, this data is retrieved from disk.

6.

Analysis Services forwards the returned data for each query request to the client and also places the retrieved data in its query result cache for use in resolving future queries.

7.

PTS on the client post-processes the returned data as needed and returns the resolved query to the client application.

There are many factors that directly affect this process. These include:

Aggregation design

Multidimensional storage mode

Number of partitions

Existing data in the query result cache

Existence of writeback tables

Use of parent-child dimensions with a ROLAP partition

In addition to these factors, which are discussed below, the data model design, the availability and configuration of hardware resources on the Analysis server, the architecture of the Analysis Services solution, and client-side settings also affect the speed with which Analysis Services responds to user queries. These effects are discussed in separate sections of this paper.

Design Effective Aggregations

The most effective technique you can use to improve overall query responsiveness (assuming the Analysis server has sufficient memory and hard disk resources) is to design effective aggregations of fact data in each cube partition. Aggregations are pre-calculated subtotals of fact data at a various levels in a cube. Because users rarely execute queries at the lowest level of a cube, designing aggregations of fact data at different dimension levels enables Analysis Services to respond quickly to most queries. For example, suppose that a client requests that Analysis Services compare the sales for two customers and return the name of the customer with the greater volume of sales for the current year. Analysis Services could resolve this query by first scanning for individual facts in the cube at its lowest level (the fact level) that meet the criteria specified in the query, and then subtotaling the sales for these customers by year at query time. Analysis Services could resolve this query faster if these subtotals had been pre-calculated and stored with the cube.

Since it is impractical to store every possible aggregation in the cube, designing effective aggregations is the art of storing the appropriate aggregations for the queries your users are submitting. The aggregation design wizards consider the cost and value of each possible aggregation.

Aggregation cost is a function of the number of rows that could be in the aggregation. Smaller aggregations have a lower cost than larger aggregations. For more information on aggregation size, see "Number and Size of Records in an Aggregation" later in this section.

Aggregation value is a function of how many queries the aggregation would be useful for. For more information on how the Analysis Services design wizards determine the value of an aggregation, see "Understanding the Aggregation Design Tools" later in this section.

The first step toward designing the appropriate set of aggregations is to understand exactly what aggregations are and how Analysis Services uses aggregations to resolve queries.

Understanding an Aggregation and How It is Used

Each aggregation represents the subtotaling of facts in a cube partition at some dimension level for each dimension in the cube. For example, imagine that you are working with a Sales cube that contains the following dimensions, levels (with counts), and measures:

A Customers dimension with five levels: All (1), Country (3), State (80), City (578), Name (3811)

A Products dimension with five levels: All (1), Category (60), Brand (911), Name (7621), SKU (8211)

A Time dimension with four levels: All (1), Year (3), Quarter (12), Month (36)

The cube contains two measures: Units Sold, Sales

Notice that the All level counts as a level. The table in Figure 1 represents a small sample of the rows at the Sales cube fact level. The fact level may contain millions of rows of data that are initially populated and then periodically updated from one or more fact tables in a relational database.

CustIDSKUDateUnits SoldSales

345-23

135123

3/12/2001

2

$45.67

563-01

451236

7/19/2001

34

$67.32

...

...

...

...

...

Figure 1 Excerpt from Sales cube fact level

Now, suppose that a client requests that Analysis Services determine the total number of units of the Brand X sold in Canada during the first quarter of 2001. In the absence of any pre-calculated subtotals stored in the cube, Analysis Services scans the lowest level in the cube for individual facts that meet the specified criteria, and then summarizes these facts according to Brand, Country, and Quarter to resolve the query. The size of the cube, the storage mode of the partitions in the cube, and the hardware resources on the Analysis server determine the length of time Analysis Services requires to resolve this query. If Analysis Services had pre-calculated subtotals at various levels and stored them in this cube, Analysis Services could resolve this query using existing subtotals rather than creating subtotals at query time.

For example, suppose that the facts in this cube had been subtotaled at the Brand, Country, and Quarter levels. Figure 2 illustrates this particular aggregation: the <Country, Brand, Quarter> aggregation.

CountryBrandQuarterUnits SoldSales

Can

Brand X

Q1,2001

9456

$23,914.30

Can

Brand X

Q2,2001

...

...

Can

Brand X

...

...

...

Can

Brand Y

Q1,2001

6722

$37,382.33

Can

Brand X

Q2,2001

...

...

Can

Brand X

...

...

...

US

Brand X

Q1,2001

22,687

$57,375.61

US

Brand X

Q2,2001

...

...

US

Brand X

...

...

...

US

Brand Y

Q1,2001

14623

$81,321.30

US

Brand Y

Q2,2001

...

...

US

Brand Y

...

...

...

...

Brand X

Q1,2001

...

...

...

Brand X

Q2,2001

...

...

...

Brand X

...

...

...

...

Brand Y

Q1,2001

...

...

...

Brand Y

...

...

...

...

Brand Y

...

...

...

Mexico

Brand X

Q1,2001

4286

$12,592.29

Figure 2 Excerpt from <Country, Brand, Quarter> aggregation

As you can see, this aggregation is a set of records containing the same dimensions and measures as the fact level of the cube but subtotaled at a particular dimension level in the cube above the fact level. If the aggregation displayed in Figure 2 were stored in the Sales cube, Analysis Services could resolve the total number of units of the Brand X sold in Canada during the first quarter of 2001 directly from this aggregation without having to scan for individual facts and then subtotaling these facts at query time. If subtotals of facts at the exact combination of levels required to resolve a query do not exist within an existing aggregation, Analysis Services attempts to resolve the query by using aggregations with subtotals at lower levels than those required to resolve the query. For example, if Analysis Services received a query requesting the total number of units of Brand X sold in each country during 2001, Analysis Services would resolve this query by using the aggregation in Figure 2. Analysis Services would begin with the subtotals in this aggregation and then calculate the units sold for the year 2001 for each country by using the subtotals of units sold during each quarter of 2001 for each country. Similarly, if Analysis Services received a query requesting the total number of units of Brand X sold in all countries during 2001, Analysis Services would begin with the subtotals in this same aggregation and then calculate the units sold during 2001 in all countries by subtotaling quarters to years and individual countries to all countries. With each of these queries, Analysis Services resolves them more quickly by using this aggregation stored with the cube rather than scanning for individual facts and then calculating the necessary subtotals at query time. The performance benefit of storing the appropriate aggregations with the cube becomes more significant as the amount of data in the cube increases because the length of time required to scan individual facts is directly related to the size of the cube.

As you have seen, Analysis Services does not require that an aggregation contain the exact subtotals required to resolve a query in order to use the aggregation to resolve the query. However, Analysis Services is only able to resolve queries based on existing aggregations if the existing aggregations are at levels matching or below those of the queries being submitted. For example, if the records in an aggregation contain subtotals at the Year level and the query requests information at the Quarter level, Analysis Services would have to query at the fact level or use another aggregation to resolve the query; the subtotals of data in this aggregation at the Year level would be of no use in resolving this query.

While it is fairly easy to see that these simple queries benefit from pre-calculated subtotals, and to explain why performance is slower without these subtotals, the performance benefits of aggregations with some types of queries are not as obvious. Top count and median are examples of queries that must touch a large number of cells to resolve, but only return a few values (these are called wide queries). For example, a query that returns the top three customers in sales for the current year or the median volume of sales per customer for the current year may require a long time for Analysis Services to resolve in a large cube. Without pre-calculated subtotals of sales information for each customer by year, Analysis Services would have to calculate subtotals for each customer at query time to resolve either of these queries. Calculating these subtotals at query time for all customers can be a very time-consuming task because Analysis Services must read all cells at the fact level containing sales information for each customer for the current year, calculate subtotals for each customer, and then compare these subtotals to determine the top three or the median. It is not obvious from the query results that are returned why Analysis Services took a significant amount of time to resolve these types of queries. Analysis Services could resolve these two queries more rapidly if pre-calculated subtotals of sales for each customer for the current year (or month) were stored in the cube.

While appropriate aggregations enable Analysis Services to quickly resolve many queries, you should also review the actual MDX used with poorly performing queries to ensure that the most efficient MDX statement is being submitted. Poorly written MDX queries are also a common reason that queries perform poorly. For information on MDX resources, see Appendix A, "Resources." Good aggregation design (the subject of the next section) helps well-written and efficient MDX queries in the same manner that proper indexing can greatly help well-written SQL queries.

Understanding the Aggregation Design Tools

SQL Server 2000 Analysis Services and the SQL Server 2000 Resource Kit include a number of tools that you can use to design and manage aggregations. These include the following:

Storage Design Wizard designs aggregations based on the assumption of a uniform pattern of query distribution.

Usage-Based Optimization Wizard designs aggregations based on information captured in a query log about the actual distribution of queries from your users, filtered by criteria that you specify.

Partition Manager, a SQL Server 2000 Resource Kit utility, enables you to manually design aggregations and to copy aggregation designs between partitions.

The Partition Aggregation utility, which is an updated version of Partition Manager, contains additional functionality for setting row counts on a per-partition basis and for applying the aggregation design for a single partition to other partitions in the cube. The Partition Aggregation utility is available with the SQL Server Accelerator for Business Intelligence (SSABI) and can be downloaded at no cost from http://www.microsoft.com/sql/ssabi/default.asp.

Cube Editor enables you to manage dimension and cube properties that affect aggregations.

After you use one or more of these tools to design aggregations for one or more partitions in a cube, you must fully process the partitions containing the new aggregation design to calculate and store the new aggregations in order for the aggregations to be useful to queries. Also, a partition with a new aggregation design must be fully processed before it can be incrementally updated. You must create a new aggregation design whenever you add or delete a level or a dimension to a cube because Analysis Services drops all existing aggregations whenever you make structural changes to a cube. Processing the partitions in a cube containing new or deleted levels or dimensions will not generate any aggregations until you create a new aggregation design incorporating the new dimension structure.

Storage Design Wizard

In the Storage Design Wizard, you specify the number of aggregations that will be generated based on an estimate of the amount of disk space you are willing to have the aggregations consume or the amount of performance gain that you want to achieve (both values are estimated conservatively). The Storage Design Wizard determines which aggregations to design by using statistical coverage algorithms that assume all queries are equally probable. In other words, the Storage Design Wizard assumes that a query is as likely to contain any level in a dimension as any other level in the dimension. The Storage Design Wizard uses this assumption in assessing the value of each aggregation and uses the member counts for each level of each dimension enabled for aggregation in assessing the cost of each aggregation. You may want to limit certain dimensions that you know are not queried frequently to only the All level in order to reduce the total number of possible aggregations and the potential size of each aggregation. This is done using the Aggregation Usage property of the dimension. For information on setting the Aggregation Usage property, see "Cube Editor" later in this section.

The Storage Design Wizard uses the member counts and the fact table row count for each partition to determine which aggregations are too large to be generated. It assumes that the fact data is randomly distributed across dimension members when determining the amount of data in each aggregation. For more information on aggregation size limits, see "Aggregation Size Limited to 30 Percent of Fact-Level Size" later in this section. The Storage Design Wizard can design the best set of aggregations only if the dimension level member counts and partition row counts are set correctly.

When you define a dimension, Analysis Services counts the current members for each level in the dimension and stores this value in the Member Count property for each dimension level. Thereafter, Analysis Services never recalculates the member counts. If you have a growing dimension, you should update the member counts manually before you use the Storage Design Wizard to design aggregations for a partition. You can query the relational tables to determine the actual number, or you can enter the number of members you anticipate will be in the cube when it is fully populated. You can update the member count for each level in each dimension individually by using the Dimension Editor in Analysis Manager, or you can update all dimension member counts at once by using the Partition Aggregation utility. You can also set the member counts using the Decision Support Objects (DSO)API. In most cases, you should set the member counts to be equal to the actual member counts you anticipate will be in the cube in the production environment. See "Decrease Dimension Counts in a Partitioned Cube" later in this section for other recommendations when using multiple partitions.

When you define a cube, Analysis Services counts the current rows in the fact table for the initial partition and stores this value in the Fact Table Size property for the cube. Thereafter, Analysis Services never recalculates the row count for any partition. Furthermore, because aggregations are designed on a per partition basis, it is the row count for the partition that is important and that must be set properly for each partition. If you have a growing fact level, you should update the row count manually for a partition before you design the aggregations for that partition. You can query the relational tables to determine the actual number, or you can enter the number of rows you anticipate will be in the partition when it is fully populated. If your cube has only one partition, you can update the row count for this partition by using the Cube Editor in Analysis Manager. This value will automatically be copied to the only partition in the cube. If you have multiple partitions, you must use the Partition Aggregation utility to set the row count for each partition (Cube Editor will not update this value across all partitions). You can use the Partition Aggregation utility to set the row counts for all partitions at once. You can also set the row count for each partition using the DSO API. See "Aggregation Size Limited to 30 Percent of Fact-Level Size" later in this section for a discussion regarding when to increase the row count above the actual number of rows at the fact level for a partition.

When users' queries follow a pattern (which is normally the case) rather than being uniformly random, the Storage Design Wizard will generate some aggregations that are superfluous, and it will overlook the design of some aggregations that would be useful. For example, suppose your company regularly generates a query of sales by state by brand by month. If the Storage Design Wizard decides to calculate an aggregation by state by brand by month, this query would benefit from this aggregation. If instead the Storage Design Wizard decides on an aggregation by country by brand by month, the query would not benefit from that aggregation. However, another query might benefit from that aggregation. The Storage Design Wizard has no information regarding which query is more likely to be submitted.

With large or complex cubes, the number of generated but unused aggregations increases, consuming more processing time and storage space. Query responsiveness is not optimal because some useful aggregations were not generated. However, until the actual query distribution pattern can be logged and aggregations created based on actual usage patterns, creating aggregations based on a uniform distribution of queries is frequently the only way to design aggregations (other than starting with no aggregations).

Usage-Based Optimization Wizard

Unlike the Storage Design Wizard, the Usage-Based Optimization Wizard enables you to design aggregations based on an analysis of the actual pattern of levels queried. The Usage-Based Optimization Wizard generates aggregations based on the usage patterns logged. In the Usage-Based Wizard, you still specify the number of aggregations that will be generated based on an estimate of the amount of disk space you are willing to have the aggregations consume or the amount of performance gain that you want to achieve (both values are estimated conservatively). However, the Usage-Based Optimization Wizard determines which aggregations to design based on the dimension levels queried by the queries that you select from the Analysis Services query log rather than an assumption that all queries are equally probable. You can select queries from the log based on queries between certain dates, queries that ran more than a specified number of times, queries that ran longer than a specific amount of time, queries by specified users, and queries to MOLAP cubes, ROLAP tables, or the server cache. The Analysis Services query log, by default, records the levels queried by every tenth query. You can change this logging pattern to log the levels queried by every query, in order to capture the actual query pattern of your users, and then turn off query logging until you are ready to create a new query log to save system resources (or leave it at every tenth query to detect changing query patterns).

Designing aggregations based on past query patterns increases the likelihood that future queries will benefit from the generated aggregations. The Usage-Based Optimization Wizard is particularly effective in increasing query responsiveness with large or complex cubes, because only a small percentage of total possible aggregations can be created with these cubes due to time and space limitations. The Usage-Based Optimization Wizard ensures that the aggregations that are created are the ones most likely to be used.

Similar to the Storage Design Wizard, the Usage-Based Optimization Wizard uses dimension level member counts to evaluate the cost of each aggregation, and uses the member counts and the partition row count to determine whether particular aggregations are too large to be built. It is crucial that these counts be accurate. See "Storage Design Wizard" earlier in this section for information on how to set these counts manually. See "Decrease Dimension Counts in a Partitioned Cube" later in this section for other recommendations when using multiple partitions. See "Aggregation Size Limited to 30 Percent of Fact-Level Size" later in this section for a discussion regarding when to increase the row count above the actual number of rows at the fact level for a partition.

The Usage-Based Optimization Wizard honors the settings for each dimension's Aggregation Usage property. If you have previously excluded certain dimensions and levels from consideration, you should use the Cube Editor to enable these dimensions and levels before running the Usage-Based Optimization Wizard, to ensure that all dimensions and levels are considered based on actual query patterns. For example, if you have marked the quarter level of the Time dimension as disabled for aggregation and users are actually querying at the quarter level, the Usage-Based Optimization Wizard will not design aggregations at the quarter level. For information on setting the Aggregation Usage property, see "Cube Editor" later in this section.

You can use the Usage-Based Optimization Wizard to add aggregations to those previously designed or to replace all existing aggregations. In both cases, the Usage-Based Optimization Wizard designs new aggregations as if there were no existing aggregations; it does not make any evaluation of the existing aggregations. When you are designing aggregations to add to existing aggregations, select the queries that are running most slowly to avoid creating excessive aggregations. When you are replacing all existing aggregations in your aggregation design, select all of the queries to ensure the best overall aggregation design. When you use the Usage-Based Optimization Wizard to design new aggregations to add to the existing aggregations, the wizard merges the newly designed aggregations with the existing aggregations (duplicate aggregations are not included twice). For example, suppose that a partition starts with 80 aggregations and the Usage-Based Optimization Wizard designs 20 aggregations. After the aggregation designs are merged, the number of aggregations in the design will be somewhere between 80 and 100. You must fully process a partition after changing its aggregation design in order to implement the new aggregation design.

Partition Manager and Partition Aggregation Utility

Partition Manager enables you to manage aggregations across multiple partitions of a cube, or to manually examine and change the aggregations in a partition. An updated version of this utility, called the Partition Aggregation utility, ships with the SQL Server Accelerator for Business Intelligence (SSABI). The SSABI version enables you to specify the estimated row count on a per-partition basis and to update member counts for all dimensions and levels at once to aid you in designing the appropriate aggregations. While the Storage Design Wizard and the Usage-Based Optimization Wizard enable you to design aggregations for only one partition at a time, Partition Manager and the Partition Aggregation utility enable you to design a set of aggregations for a single partition and then copy them to additional partitions. With these utilities, you can design aggregations for a partition with data that is representative of the target system in terms of size and data distribution, and then copy the aggregation design to other partitions.

You can also design aggregations manually, using either Partition Manager or the Partition Aggregation utility. If you plan to do so, there are a number of factors that you must consider:

Query usage varies over time. Aggregations that were appropriate at the time they were designed may not be appropriate based on current usage patterns.

Only add or change a few aggregations at a time. It is tedious and slow to make a large number of changes manually. Use the Storage Design Wizard or the Usage-Based Optimization Wizard to make major changes to existing aggregations.

Only add aggregations that will be used. Unused aggregations are a waste of processor and disk storage resources.

Monitor the impact of additional aggregations on processing performance. As you add additional aggregations, the time required for processing the aggregations increases. For more information on the impact of aggregations on processing performance, see "Avoid Designing Excessive Aggregations" later in this paper.

Designing the appropriate aggregations manually requires a thorough knowledge of user query patterns and an understanding of the benefits and costs associated with aggregations.

Cube Editor

Analysis Manager's Cube Editor enables you to control what dimensions and levels are considered for aggregation by setting a dimension's Aggregation Usage property. These settings do not directly determine aggregation designs; rather, they alter the behavior of the aggregation design wizards and tools. Values for the Aggregation Usage property are the following:

Top Level Only ensures that only one level, typically the All level (which contains only one member), is considered by the aggregation design wizards. Including only this single level reduces the exponential growth in the number and size of aggregations. For example, if you have 15 dimensions in a cube and you set 8 of the dimensions to Top Level Only, the effective complexity of the cube is reduced to 7 dimensions. If the top level is included in an aggregation, it reduces the size of the aggregation and the time required to calculate it. To avoid the use of flexible aggregations (see "Types of Aggregations" later in this section), Top Level Only is the default setting for virtual dimensions.

Bottom Level Only ensures that only the bottom level is considered by the aggregation design wizards, which also reduces cube complexity. Aggregations containing the bottom level will be larger than those that use the top level only, and can provide better or worse overall query performance depending on the query mix. Bottom Level Only can be a good idea for small dimensions with a single level under the All level. A customer gender dimension is an excellent example of this.

Top and Bottom Levels has the combined effects of choosing Top Level Only and Bottom Level Only. The effective complexity of the cube is reduced if there are levels between the top and bottom levels. To avoid the use of flexible aggregations (see "Types of Aggregations" later in this section), Top and Bottom Levels is the default setting for changing dimensions.

Standard permits the aggregation design wizards to evaluate all levels in considering potential aggregations to include in the cube.

Custom allows you to choose the levels to be considered for inclusion in aggregations. Individual levels are then included or excluded by setting the Enable Aggregations property on the level to Yes or No. Two cubes that use the same shared dimension can have different levels enabled for aggregation in each cube.

Aggregation Design Considerations and Guidelines

Once administrators understand that aggregations improve query performance, their initial tendency may be to generate excessive aggregations in an attempt to ensure that subtotals exist to cover the greatest number of potential queries. This tendency is particularly tempting because the Storage Design Wizard enables you to design aggregations based on an estimated percentage gain in performance. If one aggregation design achieves a 20 percent performance gain, it is an easy assumption to make that an aggregation design that achieves a 40 percent gain must be better, and that an aggregation design that achieves a 60 percent gain must be even better.

While having more aggregations increases overall query responsiveness, more aggregations require more time to process and more space to store. If you have a large or complex cube, the generation of excessive aggregations requires a tremendous amount of processor resources, generates substantial disk I/O, and consumes a substantial amount of disk space. Even with a small to medium sized cube, generating excessive aggregations substantially increases processing time with little gain in query responsiveness. While aggregations are being calculated, query responsiveness decreases due to competition for hardware resources (memory, disk I/O, and processor).

You should design only aggregations that are likely to be used; designing excessive aggregations frequently causes Analysis Services to generate aggregations that are rarely or never used. For more information about the impact of the design of excessive aggregations on processing, see "Avoid Designing Excessive Aggregations" later in this paper.

Before discussing how to design the appropriate set of aggregations, you need to understand the impact of your cube design on the number of possible aggregations and the number and size of records within an aggregation. Each of these factors affects your aggregation design and the time required to calculate the aggregations.

Number of Aggregations

The theoretical maximum number of possible aggregations in a cube is the product of the number of levels in each cube dimension. As you add levels and dimensions to a cube, the number of possible aggregations increases exponentially. The higher the number of dimensions and levels in a cube, the greater its complexity. In the example in Figure 2, the Time dimension has four levels, the Customers dimension has five levels, and the Products dimension has five levels. This yields a theoretical maximum number of aggregations of 100 (5 customer levels x 5 product levels x 4 time levels). However, this number increases exponentially as you add dimensions or levels. For example, if you add the Day level to the Time dimension, the theoretical maximum number of aggregations increases to 125 (5 x 5 x 5). Now, suppose you add two more dimensions to this cube, each with three levels. The theoretical maximum number of aggregations increases to 1125 (5 x 5 x 5 x 3 x 3). A cube with nine dimensions containing five levels each yields theoretical maximum number of aggregations of 1,953,125. A cube of this complexity is considered a cube of medium complexity. A cube of high complexity might have 20 dimensions with five levels each and yield a theoretical maximum number of aggregations of approximately 95 trillion. As you can see, you can directly affect the theoretical maximum number of aggregations in a cube by changing the number of dimensions or the number of levels. Having multiple dimensions with deep hierarchies improves the ability of users to perform analysis, but having too many of either can lead to resource problems during querying and processing.

Because Analysis Services can easily calculate higher-level subtotals at query time by using existing aggregations at lower levels, you do not need to design more than a subset of the total number of possible aggregations to obtain a sufficiently useful set of aggregations. Before discussing how to design the appropriate set of aggregations, you need to understand the amount of space consumed by aggregations, in order to understand the impact cube design decisions have on aggregation design. Large aggregations consume more memory, require more disk space, and generate more disk I/O. For more information on the impact of the number of levels, member properties, and dimensions on performance, see "Minimize Dimensions, Levels, Members, and Member Properties" later in this paper.

Number and Size of Records in an Aggregation

The theoretical maximum number of records in an aggregation is initially determined by the product of the number of members in each dimension level at which the aggregation is being calculated. So, while the number of dimensions and levels determines the theoretical maximum number of aggregations, the number of members at each level in the aggregation determines the theoretical maximum number of records within an aggregation. For example, if the aggregation in Figure 2 contains 3 members at the Country level, 911 members at the Brand level, and 12 members at the Quarter level, the aggregation will contain a maximum of 32,796 records. If, however, the aggregation were at the Country, Brand, and Year levels, the aggregation would only contain a maximum of 8,199 records. As you can see, the levels at which the aggregations are calculated have a major impact on the maximum number of records in an aggregation and in turn the maximum size of the aggregation.

Factors Affecting the Actual Size of an Aggregation

Note: The numbers used in this section are for discussion purposes only. You should not expect to see these sizes on real systems, for the reasons discussed in this section.

The actual size (number of bytes) of each aggregation depends on three factors: the number of records, the number of measures, and the density of the data.

Number of records The aggregation for the Sales cube in Figure 1 at the All Customers, All Products, and All Time levels consists of a single record containing the total of the Units Sold measure and the Sales measure at the top level of these dimensions. This record would consume up to 8 bytes (before compression), assuming that each measure is a 4-byte integer. Due to the way Analysis Services compresses data, less than 4 bytes might be used to store a 4-byte integer. The aggregations for the Sales cube in Figure 1 at the Country level in the Customers dimension, the All level in the Products dimension, and the Year level in the Time dimension would consist of nine records and consume up to 72 bytes (assuming the same 4-byte integer size for each measure). However, an aggregation near the bottom of each level will be much larger. For example, the aggregation at the Country, Brand, Quarter level, with a theoretical maximum of 32,768 records, could consume up to 256 megabytes (MB).

Number of measures Each additional measure increases the size of an aggregation linearly (not exponentially). If you add three more measures to the example in Figure 2, the aggregation with 32,768 records increases to approximately 640 MB (each measure consuming up to approximately 128 MB). Notice that increasing the number of measures in a cube has a much smaller impact on the size of aggregations than increasing the number of levels or dimensions (its complexity).

Density of data Analysis Services does not store a record for aggregations with empty member combinations. For example, in Figure 2 above, if Brand X were not sold in Canada, no record would be stored for this cube cell and no time would be required for these calculations. Aggregations in sparse partitions consume substantially less space than the theoretical maximum and take substantially less time to process than those in dense partitions.

The theoretical storage numbers in this section refer to the space required to store the data only; some additional space is required to store the key information used by Analysis Services to identify each record. In addition, these theoretical numbers do not take into account the compression algorithms used by Analysis Services to reduce the amount of actual space consumed.

Two situations — distinct count measures and virtual dimension aggregations — can cause each aggregation to require a significant amount of additional time to generate and additional space to store.

Distinct count measures When a cube contains a distinct count measure, the size of each aggregation is multiplied by the number of distinct members of the measure. For example, if you create a distinct count measure on the Sales cube in Food Mart sample cube and design aggregations for a 20 percent performance gain using the Storage Design Wizard, the Sales cube grows from 90 kilobytes (KB) to 13 MB. However, if the data in your cube is not dense, the exponential impact on the size of each aggregation is mitigated by the number of empty records. For strategies to minimize the negative performance impact of distinct count measures, see "Use Distinct Count Measures Appropriately" later in this paper.

Virtual dimension aggregations By default, only the All levels of virtual dimensions will be included in aggregations. However, if you change the Aggregation Usage property for a virtual dimension from Top Level Only to Standard (using the Cube Editor in Analysis Manager), aggregations can include these intermediate or leaf levels. Administrators may choose to design aggregations containing these levels in order to increase the performance of queries against these levels. The impact of enabling aggregations for intermediate and leaf levels in a virtual dimension is the same as adding a standard dimension to a cube.

You can affect the size of aggregations by limiting the dimensions and levels that are included in aggregations. For more information on how to accomplish this, see "Cube Editor" earlier in this paper. You can also affect the size of aggregations by keeping the size of member keys small. For information on choosing the appropriate data type for member keys to reduce their impact on aggregation size, see "Choose Appropriate Data Types and Sizes" later in this paper.

Aggregation Size Limited to 30 Percent of Fact-Level Size

The Analysis Services aggregation design wizards will not design an aggregation whose possible size is over 30 percent of the size of the fact level (this is sometimes called the One-Third Rule). When you have dimensions with a large number of members, this threshold can easily be exceeded at or near the leaf level. The aggregation in Figure 2 has a potential size of 32,796 records (ignoring empty records). As long as the estimate of the number of rows at the fact level is more than 109,320, this aggregation will be considered by the aggregation wizards for creation. However, an aggregation at a lower level, such as the <State, Name, Month> aggregation, would not be considered unless the estimate of the number of records at the fact level is more than approximately 73 million (80 x 7621 x 36 / 0.3). As the number of dimension members increases at deeper levels in a cube, it becomes less likely that an aggregation will contain these lower levels because of the One-Third Rule. The aggregations excluded by the One-Third Rule are those that would be almost as large as the fact level itself and almost as expensive for Analysis Services to use for query resolution as the fact level. As a result, they add little or no value.

Administrators will sometimes trick the Analysis Services aggregation design wizards by setting the fact-level size higher than the actual row count in a partition in order to force these tools to calculate large aggregations that exceed the One-Third Rule. This approach is not recommended, because it results in excessively large aggregations being calculated and built. The fact-level size should be set according to the actual or expected size of the fact level in production, but not larger. You might consider increasing the row count above the actual count when data is distributed unevenly or when you are working on a proof-of-concept system.

Uneven distribution of data

When the aggregation design wizards determine whether a particular aggregation will be built, these tools assume a random distribution of factdata over the dimension members. If, however, the actual data is distributed unevenly such that the actual facts are concentrated in a small number of records, you can increase the estimated fact table size to artificially alter the application of the One-Third Rule. In this scenario, the actual aggregation sizes will be substantially smaller than the estimates of their size by the aggregation design wizards because many of the records in the aggregation will be empty.

Using our example in Figure 1, suppose that certain brands were not sold in Canada and the cube contained a large amount of data from Canada. When designing aggregations, the aggregation design wizards assume that the fact level is randomly distributed over all dimension members in the cube and might decide not to create an aggregation containing the Product Name level because the wizard would estimate that this aggregation would violate the One-Third Rule. However, because many records at the fact level are empty, the aggregation would not be as large as estimated. As a result, this aggregation would not actually violate the One-Third Rule if it were built. If you increase the estimated fact level row count, aggregations will be created lower in the dimension hierarchy without violating the One-Third Rule.

You should use this technique only when you are designing aggregations with the Usage-Based Optimization Wizard, rather than with the Storage Design Wizard. The Usage-Based Optimization Wizard ensures that the additional aggregations that are designed are likely to be used, while the Storage Design Wizard does not. For more information on these tools, see " Understanding the Aggregation Design Tools" earlier in this section.

Proof-of-concept system

When you are in the design phase of a pilot or proof-of-concept (POC) system, you normally work with a subset of the actual data. However, you need to ensure that aggregations are designed to reflect the size of data that the cube will actually contain, not the subset of data used for the pilot. In this case, you should increase the row count to reflect the row count for the actual data that will exist in the cube once the full dataset is loaded. If you fail to increase the row count, the aggregation design wizards will determine the aggregation design from an artificially small aggregation pool because of the impact of the One-Third Rule, and will design aggregations that are artificially high in the hierarchy trees. Because the goal is to have a distribution of aggregations at various combinations of levels appropriate to the actual data in the cube, queries will respond more poorly than they would otherwise respond because the aggregations were designed for a smaller dataset. When you correct this row count, the aggregation design wizards will design and calculate larger aggregations using levels that are lower in the hierarchy. As a result, you can expect that processing will take more time than aggregations that are designed artificially high in the hierarchy, but overall query performance will improve.

The row count is set initially when each partition in a cube is created. It is not automatically updated. The Fact Table Size property for a cube, which is exposed in Cube Editor, is the initial value for the first partition created. If you have a single-partition cube, the row count for the partition will be set according to the row count for the cube. For a multi-partition cube you will need to use a utility such as the Partition Aggregation utility to set the count for each partition. For more information on setting row counts manually, see "Understanding the Aggregation Design Tools" earlier in this section.

Types of Aggregations

Analysis Services generates both rigid and flexible aggregations for MOLAP and HOLAP partitions, each with different benefits and performance impacts. ROLAP partitions contain only rigid aggregations. (For more information on aggregations, see "Choose MOLAP to Improve Query Responsiveness" later in this section.)

Rigid aggregations Rigid aggregations are aggregations that include only levels in a standard dimension, the All level or the leaf level in a changing dimension, or the All level in a virtual dimension. Rigid aggregations are updated when the partition or cube is incrementally processed, but are only completely recalculated when a partition or cube is fully processed or its data is refreshed. By default, while rigid aggregations are being recalculated, Analysis Services resolves queries using the existing aggregations until the new aggregations are completely recalculated (unless the processing of a shared dimension knocks the cube completely offline). You can choose to have the aggregations recalculated as a background process (called "lazy processing") to enable users to see the updated data sooner, but without the benefit of aggregations until they have been recalculated. During incremental processing, existing aggregations are not dropped and recalculated. Rather, aggregations are generated for the new data (using a temporary partition) and then merged with the existing aggregations. While the new aggregations are being generated, Analysis Services continues to use the existing aggregations to resolve queries.

Flexible aggregations Flexible aggregations are aggregations that include an intermediate level in a changing dimension or an intermediate or leaf level in a virtual dimension. Flexible aggregations are created only if you enable aggregations at intermediate levels of changing dimensions or enable aggregations for virtual dimension levels. All flexible aggregations in all partitions are dropped and completely recalculated whenever a partition or cube is fully processed, in the same manner as rigid aggregations. In addition, all flexible aggregations in all partitions are dropped whenever you incrementally or fully process a changing dimension or a physical dimension on which a virtual dimension is based, if flexible aggregations exist that are based on that changing or virtual dimension. In each MOLAP partition, these dropped aggregations are recalculated as a background task, unless you choose to retain the old data and aggregations until new aggregations can be calculated. However, in a HOLAP partition, these dropped aggregations are not recalculated until each partition is processed.

By default, Analysis Services recalculates flexible aggregations as a background task using lazy processing. While the flexible aggregations are being recalculated, users can continue to query the cube (without the benefit of the flexible aggregations). While the flexible aggregations are being recalculated, queries that would benefit from the flexible aggregations run slower because Analysis Services resolves these queries by scanning for individual facts at the fact level and then summarizing the data at query time. As the flexible aggregations are recalculated, they become available incrementally on a partition-by-partition basis.

SP3 provides an option to update changing dimensions in a transaction and enable existing flexible aggregations to be retained on disk while new flexible aggregations are calculated as a background task. This approach retains the stale data with the existing aggregations until the new aggregations are completely recalculated. For more information on changing dimensions and virtual dimensions, see "Use Incremental Processing and Changing Dimensions" and "Use Virtual Dimensions Effectively" later in this paper.

An Aggregation Design Strategy

Given the range of tools and choices available to the administrator, it is not easy to formulate an overall approach for managing aggregation design. This section outlines a general plan for managing the aggregation design over the life of a system.

When you initially deploy a cube, you cannot use the Usage-Based Optimization Wizard because no usage data yet exists. However, because Analysis Services generally resolves user queries faster with some aggregations than with none, you should initially design a limited number of aggregations using the Storage Design Wizard. The number of initial aggregations that you should design depends on the complexity and size of the cube (the fact-level size).

With a small cube, an effective approach is to initially design aggregations to achieve a 20 to 30 percent increase in performance (this should take less than 15 minutes). With a large and complex cube, it will take Analysis Services a long time just to design a small percentage of the possible aggregations. For example, in a complex cube with twenty dimensions containing five levels each, it will take the Storage Design Wizard approximately 3024 years to consider all 95 trillion possible aggregations, if you assume that the wizard can examine 1000 aggregations per second (which is a very generous estimate). Furthermore, a large number of aggregations will take a long time to actually calculate and will consume a large amount of disk space. An effective approach with large and complex cubes is to initially design aggregations to achieve a small performance increase (less than 10 percent, possibly down to even 1 or 2 percent with very complex cubes) and then allow the Storage Design Wizard to run for no more than 15 minutes. With a medium-complexity cube, design aggregations to achieve a 10 to 20 percent increase in performance and then allow the wizard to run for no more than 15 minutes. While it is difficult to define what constitutes a high-complexity cube versus a medium-complexity cube, consider this general guideline: a high-complexity cube is a cube with more than 15 multi-level dimensions.

Before you create these initial aggregations with the Storage Design Wizard, you should disable aggregations (by setting their Aggregation Usage property to Top Level Only using Cube Editor) for dimensions you anticipate users will rarely query. This minimizes the aggregations that are calculated for dimensions that you know are used rarely and increases the probability that the aggregations that are designed and created will be useful. While this technique is important for data stored using any storage mode, this technique is especially important for data stored in a ROLAP partition, because aggregations stored using ROLAP require significantly more disk storage than aggregations stored using MOLAP or HOLAP. For more information on aggregations and storage mode, see "Choose MOLAP to Improve Query Responsiveness" later in this section.

After users have queried the cube for a sufficient period of time to gather useful query pattern data in the query log (perhaps a week or two), use the Usage-Based Optimization Wizard to perform a usage-based analysis for designing additional aggregations that would be useful based on actual user query patterns. You can then process the partition to create the new set of aggregations. Remember that the Usage-Based Optimization Wizard only works on one partition at a time. You will have to either perform this same process on each partition, or use the Partition Aggregation utility to copy the aggregation design to additional partitions.

As usage patterns change, use the Usage-Based Optimization Wizard, Partition Manager, or the Partition Aggregation utility to add additional aggregations (but avoid adding too many aggregations). After you have added additional aggregations a considerable number of times, you need to consider replacing the current aggregation design with a new aggregation design containing a smaller number of total aggregations based on current query patterns. Over time, the incremental addition of new aggregations to your initial aggregation design may cause processing times to become too large for the processing window (nightly or weekly), while a new design based on current usage patterns may be just as effective for query responsiveness.

If you are using partitions, you can create different aggregation designs for different partitions. For more information on the use of partitions, see "Use Multiple Partitions to Improve Query Responsiveness" later in this section, and "Use Multiple Partitions to Improve Processing Performance" later in this paper.

In a cube with multiple partitions based on time, new data should flow into the single partition corresponding to the most recent time period (for example, the latest month). Generally that is also the partition most frequently queried. A common aggregation strategy in this case is to perform Usage Based Optimization to the most recent partition, leaving older, less frequently queried partitions as they are. The newest aggregation design can also be copied to a "base" partition. This base partition holds no data — it serves only to hold the current aggregation design. When it is time to add a new partition (for example, at the start of a new month) the base partition can be cloned to a new partition. When the slice is set on the new partition, it is ready to take data as the current partition. Following an initial full process, the current partition can be incrementally updated for the remainder of the period. See also "Incremental Partition Processing" later in this paper.

Choose MOLAP to Improve Query Responsiveness

The Analysis Services storage mode you choose for each partition in your cube is another factor determining the speed with which Analysis Services resolves user queries. You can save each partition in your cube using one of three storage options: multidimensional OLAP (termed MOLAP), hybrid OLAP (HOLAP), and relational OLAP (ROLAP).

Multidimensional OLAP (MOLAP)

MOLAP partitions store aggregations and a copy of the source data (fact and dimension data) in a multidimensional structure on the Analysis server. All partitions are stored on the Analysis server. Analysis Services responds to queries faster with MOLAP than with any other storage mode for the following reasons:

Compression Analysis Services compresses the source data and its aggregations to approximately 20 percent of the size of the same data stored in a relational database. The actual compression ratio varies based on a variety of factors, such as the number of duplicate keys and bit encoding algorithms. This reduction in storage size enables Analysis Services to resolve a query against fact-level data or aggregations stored in a MOLAP structure much faster than against data and aggregations stored in a relational structure because the size of the physical data being retrieved from the hard disk is smaller.

Multidimensional data structures Analysis Services uses native multidimensional data structures to quickly find the fact data, at either the fact level or at higher aggregation levels. With ROLAP and HOLAP partitions, Analysis Services relies on the relational engine to perform potentially large table joins against fact-level data stored in the relational database to resolve some or all queries. Large table joins against relational structures take longer to resolve than similar queries against the MOLAP structures. For information on reducing the number of these joins, see "Eliminate Joins Between Fact and Dimension Tables" later in this paper.

Data in a single service MOLAP partitions are generally stored on a single Analysis server, with the relational database frequently stored on a server separate from the Analysis server. When the relational database is stored on a separate server and partitions are stored using ROLAP or HOLAP, Analysis Services must query across the network whenever it needs to access the relational tables to resolve a query. The impact of querying across the network depends on the performance characteristics of the network itself. Even when the relational database is placed on the same server as Analysis Services, inter-process calls and the associated context switching are required to retrieve relational data. With a MOLAP partition, calls to the relational database, whether local or over the network, do not occur during querying. For more information on placement, see "Place the Relational Database on the Analysis Server" later in this paper.

Hybrid OLAP (HOLAP)

HOLAP partitions store aggregations in a multidimensional structure on the Analysis server, but leave fact-level data in the original relational database. As a result, whenever Analysis Services needs to resolve a query against fact-level data stored in a HOLAP partition, Analysis Services must query the relational database directly rather than querying a multidimensional structure stored on the Analysis server. Furthermore, Analysis Services must rely on the relational engine to execute these queries. Querying the relational database is slower than querying a MOLAP partition because of the large table joins generally required.

Many administrators choose HOLAP because HOLAP appears to require less total storage space while yielding excellent query performance for many queries. However, these apparent justifications for using HOLAP storage option are negated by the likelihood of excessive aggregations and additional indexes on relational tables.

Excessive aggregations Query responsiveness with HOLAP partitions relies on the existence of appropriate aggregations so that Analysis Services does not have to resolve queries against the fact table in the relational database. To ensure that a wide range of aggregations exists, administrators sometimes resort to generating excessive aggregations by increasing the performance improvement percentage in the Storage Design Wizard, or artificially increasing the partition row counts (and sometimes both). While these techniques will increase the percentage of queries that Analysis Services can resolve using aggregations, there will always be some queries that can only be resolved against the fact level (remember the One-Third Rule). In addition, generating additional aggregations to improve query responsiveness comes at the cost of significantly longer processing times and increased storage requirements (which also negates the space savings). For more information about how the storage mode affects partition processing, see "Choose MOLAP to Improve Processing Performance" later in this paper.

Additional indexes on relational tables To ensure that the relational engine can quickly resolve queries that Analysis Services must resolve against the fact table in the relational database, administrators frequently add appropriate indexes to the fact and dimension tables. These additional indexes frequently require more space than MOLAP requires to store the entire cube. The addition of these indexes negates the apparent savings in disk space that is sometimes used to justify HOLAP. In addition, maintaining the indexes on the relational tables slows the relational engine when adding new data to the relational tables. For more information on indexes on the relational database, see "Tune the Relational Database" later in this paper.

Relational OLAP (ROLAP)

ROLAP partitions store aggregations in the same relational database that stores the fact-level data. By default, ROLAP partitions store dimensions in MOLAP on the Analysis server, although the dimensions can also be stored using ROLAP in the relational database (for very large dimensions). Analysis Services must rely on the relational engine to resolve all queries against the relational tables, storing both fact-level data and aggregations. The sheer number of the queries with large table joins in large or complex cubes frequently overwhelms the relational engine.

Before the release of SQL Server 2000 (64-bit), ROLAP dimensions were required for cubes with very large dimensions (ROLAP dimensions can only be used with ROLAP partitions). Very large dimensions could not be stored in MOLAP because they were too large to be loaded into memory (MOLAP dimensions are always loaded into memory). With the 32-bit version of SQL Server 2000, very large dimensions may still require the ROLAP dimensions with ROLAP cubes. For more information on how Analysis Services uses physical and virtual memory, see "How Analysis Services Uses Memory" later in this paper. However, with the release of SQL Server 2000 (64-bit), Analysis Services can now store very large dimensions (in excess of 10,000,000 members across all dimensions) using MOLAP.

If performance is your goal, the only situation in which ROLAP storage should be used is when implementing real-time OLAP. In this case, to minimize the performance cost with ROLAP, consider creating a small real-time ROLAP partition and create all other partitions using MOLAP. Using MOLAP for the majority of the partitions in a real-time OLAP solution allows you to optimize the query responsiveness of Analysis Services for most queries, while obtaining the benefits of real-time OLAP. For more information on using multiple partitions to increase query responsiveness, see "Use Multiple Partitions to Improve Query Responsiveness" later in this section.

With both HOLAP and ROLAP partitions, eliminating some or all of the large table joins increases query responsiveness. For more information, see "Eliminate Joins Between Fact and Dimension Tables" later in this paper.

Use Multiple Partitions to Improve Query Responsiveness

Every cube has at least one partition, and the cube reflects the combined data of all of its partitions. You can divide a cube into multiple partitions if you are using SQL Server 2000 Enterprise Edition (or SQL Server 7.0 Enterprise Edition). During development, you can do the same using SQL Server 2000 Developer Edition. Each partition can have its own fact table, aggregation design, and storage mode. The division of a cube into multiple partitions is transparent to the user. When a query requests data that spans multiple partitions, Analysis Services uses aggregations or scans data in each partition and then combines this information to resolve the query and return the results to the user. You can significantly increase query responsiveness and processing performance by horizontally segmenting the data by one or more keys, such as date or department, and dividing the cube into multiple partitions. This section focuses on improving query responsiveness by using multiple partitions. For information on using partitions to increase processing performance, see "Use Multiple Partitions to Improve Processing Performance" later in this paper.

Understanding the Effect of Storage Mode on Partitions

By default, each Analysis Services cube consists of a single partition, which is stored using MOLAP, ROLAP, or HOLAP. Using multiple partitions is most beneficial when a partition is stored using MOLAP.

When using MOLAP to store a partition, a small number of files on the Analysis server store the fact-level and aggregation data for each partition. The files holding the fact-level data for each partition are smaller and the aggregations are also smaller than if all of the cube data were stored in a single partition. Because the size of an aggregation is based on the non-empty combinations of members across the levels in the aggregation, the aggregations for each partition may be smaller. Smaller aggregations require less time to calculate and less time to store.

When using HOLAP to store a partition, a small number of files on the Analysis server store the aggregation data for each partition. The files holding the aggregations are smaller than if all of the cube aggregations were stored in a single partition as is the case with MOLAP. When using ROLAP to store a partition, the aggregations are stored in a greater number of smaller tables than the tables used for the aggregations for a single partition.

Regardless of the storage mode, Analysis Services does not perform any partitioning of fact or dimension data in the relational database. Fact table partitioning, which must be initiated by a system administrator using relational database management tools, will increase processing performance for MOLAP and overall performance for HOLAP and ROLAP. For more information on the benefits of fact table partitioning, see "Use Multiple Partitions to Improve Processing Performance" and "Eliminate Joins Between Fact and Dimension Tables" later in this paper.

Analysis Services resolves queries faster against MOLAP and HOLAP partitions because there is less data in each partition to be evaluated. For example, imagine a cube with 36 months of data that is divided into 36 monthly partitions. A query requesting information about the most recent month need only evaluate 1/36th of the data that would otherwise need to be evaluated. This query will perform dramatically faster with partitioning by month than with no partitioning. Partitioning can increase query responsiveness by 100 to 1000 percent, particularly under multi-user loads. For maximum performance, Analysis Services should touch as few partitions as possible when resolving a query. If you are not using MOLAP for all partitions, you should use MOLAP for these frequently queried partitions, to maximize query responsiveness against these partitions.

Define the Data Slice for Each Partition

If the data slice value for a partition is set properly, Analysis Services can quickly eliminate irrelevant partitions from the query processing and significantly reduce the amount of physical I/O and processor time needed for many queries issued against MOLAP and HOLAP partitions. With multiple ROLAP partitions, Analysis Services generates more queries to the relational database, but each query is resolved against a smaller table. For more information on partitioning in the relational database, see Using Partitions in a Microsoft SQL Server 2000 Data Warehouse in the MSDN library.

To enable Analysis Services to take full advantage of partitions, you must define the data slice for each partition in the Partition Wizard of Analysis Manager. The data slice identifies the actual subset of data contained in each partition. The Partition Wizard does not require you to set this data slice when you create a partition. As a result, it is possible to create a partition without setting the data slice.

Caution: Creating a partition without setting the data slice is not a good practice, and can result in considerable overhead being added to the system (artificially increasing response times). Without the data slice, Analysis Services cannot limit a query to the appropriate partitions and must scan each partition even if zero cells will be returned.

The data slice enables Analysis Services to determine which partitions contain data relevant to the query.

To maximize querying performance with partitions, construct partitions with data slices that mirror the data slices required by your users. For example, suppose you are deploying a cube that typically tracks data as a time series (such as a financial cube), and most queries will retrieve data based on time period. You should partition the cube by time period to provide the greatest performance benefit. With very large cubes, partitioning along multiple dimensions (such as time and department) can yield substantial query responsiveness benefits. Remember that each partition can have a different aggregation design.

Decrease Dimension Counts in a Partitioned Cube

When you divide a cube into multiple partitions, you should modify the member counts to enable the aggregation design wizards to correctly evaluate the cost of each aggregation and to compare the size of each aggregation to the fact level size for each partition. The member counts stored in the MemberCount property for each dimension level is the member count for the number of dimension members at that level in the cube, not the dimension members at that level in a partition. Let's look at two examples to help explain how to modify the dimension counts used by the aggregation wizards.

Suppose the Sales cube in Figure 1 is partitioned by month, with the data slice defined properly for each partition. The member counts for each Time dimension level are All (1), Year (3), Quarter (12), Month (36). The wizards will not design any aggregations in these partitions that include the All, Year, or Quarter levels of the Time dimension because each partition only contains data for a single month, and the aggregation design wizards will not create an aggregation in a partition that includes a level higher than the slice level for the partition. However, when the aggregation design wizards attempt to determine the cost of an aggregation at the month level and compare the size of an aggregation at that level to the fact level size, the member count at the month level is 36, rather than 1.

Next, suppose that a cube comprises a partition for each quarter rather than for each month. In this case, the aggregation design wizards will not design any aggregations at the All or the Year level because each partition only contains data for a single quarter. However, the member counts for the Quarter and Month level are 12 and 36 respectively, rather than 1 and 3. An aggregation at the Month level will only contain three records for Month, not 36 as the aggregation design wizards will estimate based on the member counts. As a result, the cost of this aggregation is overestimated.

In both of these examples, applying the original member counts to the partition aggregation design will result in aggregations being created artificially high in each partition, because the size estimates used by the aggregation design wizards were applicable to the cube, not to the partition. To avoid this problem and to ensure the aggregation design wizards design the appropriate set of aggregations for the data in each partition, you should manually set the members counts for levels based on your partitioning scheme. However, since these member counts are stored in the MemberCount property at each dimension level (not in the partition), if a dimension is shared with other cubes with a different partitioning scheme, you must set the member counts for the dimension to numbers appropriate for each cube immediately before designing aggregations for that cube.

Warm the Query Results Cache When Possible

Another technique that you can use to increase query responsiveness is to preload the query results cache using a generalized set of representative user queries. This is called cache warming. The query results cache is a portion of memory that Analysis Services uses to store the results of user queries for use in resolving future queries. While the query results cache stores the results of most queries, the query results cache does not cache the results of query time calculations. For more information, see "Use Query Time Calculations Sparingly in Complex Cubes" later in this paper.

When Analysis Services first starts, or after a partition or cube is processed, the query results cache is empty. Until the query results cache is loaded with data from new queries, Analysis Services must resolve user queries using data stored on disk, either by scanning the fact level or by using aggregations. You can create an application that executes a set of generalized queries to simulate typical user activity, in order to expedite the process of populating the query results cache. For example, if you determine that users are querying by month and by product, you can create a set of queries that request data by product and by month. If you run this query whenever you start Analysis Services, or process the cube or one of its partitions, this will pre-load the query results cache with data used to resolve these queries before users submit these types of query. This technique will substantially improve Analysis Services response times to user queries that were anticipated by this set of queries.

To determine a set of generalized queries, you can use the Analysis Services query log to determine the dimension levels typically queried by user queries. For example, if you review the query log and determine that users frequently query by month, product name, and country, the following query would pre-fetch that data:

select
   [Time].[Month].members on 0,
   [Product].[Product Name].members on 1,
   [Customer].[Country].members on 2
from Orders

If you only wanted to warm the cache with data from the current year, rather than for all years, you could use the following query, which assumes that a named set [Current Year] tracks the current calendar year:

select
   Descendants([Current Year].Item(0), [Time].[Month]) on 0,
   [Product].[Product Name].members on 1,
   [Customer].[Country].members on 2
from Orders

Once these queries are loaded into the query results cache, they remain there until the cleaner thread removes them or the cache is flushed during partition processing. For information about optimizing the use of memory in Analysis Services and the cleaner thread, see "Provide Sufficient and Appropriately Configured Memory" later in this paper.

You can use an application, such as a Microsoft Excel macro, an Office Web Component Web page, or a script file, to warm the cache whenever you have performed an operation that flushes the query results cache. For example, this application could be executed automatically at the end of the cube processing step. Running this application under an identifiable user name enables you to exclude that user name from the Usage-Based Optimization Wizard's processing and avoid designing aggregations for the queries submitted by the cache warming application.

While cache warming can be a useful technique, cache warming should not be used as a substitute for designing and calculating an appropriate set of aggregations. When you are testing the effectiveness of different cache-warming queries, you should empty the query results cache between each test to ensure the validity of your testing. For information on flushing the query results cache, see Appendix C, "How to Flush the Query Results Cache."

Optimize Cell Writeback

If you write-enable a cube, users can change the values of a cube's cells. Analysis Services stores the deltas in a separate writeback table in the relational database. For example, suppose a cell contains a value of 100, and the user changes it to a value of 20. In this case, Analysis Services stores the value of -80 in the writeback table. Analysis Services does not change the original fact-level data. When a user queries the changed cell, Analysis Services rolls up the changes in the writeback table, combines it with the value from the fact level, and returns the new value of 20.

Users can update cell values for any cell, not only leaf-level cells. When a higher-level value is updated, the client application uses the UPDATE CUBE statement to distribute the changes for each cell value to leaf-level entries in the writeback table. Users see the net effect of all changes in the writeback table when browsing the cube. To accomplish this, Analysis Services merges this data with fact-level data and existing aggregations at query time. Query performance with a writeback cube will be less than with a cube that is not write-enabled. Actual performance depends on the extent of the writeback changes and the actual query. Concurrent writeback operations also incur a performance penalty.

To increase the performance of write-enabled cubes, consider taking the following steps:

Store the writeback table in SQL Server. SP3 introduced significant performance increases for concurrent data writebacks that are stored in SQL Server and use the Microsoft OLEDB Provider for SQL Server driver. When storing the writeback table in SQL Server 2000 with SP3, set the database for the writeback table to simple recovery mode and specify the SSFastLoadOptions registry key with one of the following string values:

Empty string value enables the use of IRowsetFastLoad, the SQL Server bulk insert feature.

TABLOCK enables the use of IRowsetFastLoad and enables a table lock, which maximizes the performance of bulk inserts.

TABLOCK, FIRE_TRIGGERS enables the use of IRowsetFastLoad, enables a table lock, and enables triggers to fire. For more information on the FIRE_TRIGGERS option, see Using BCP and BULK INSERT in the MSDN library.

The SSFastLoadOptions registry key is located at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\OLAP Server\CurrentVersion and is disabled by default. If you change this default to one of the above values, the value is passed to the SSPROP_FASTLOADOPTIONS OLEDB property. For more information, see IRowsetFastLoadRowsets in Appendix A, "Resources."

Convert writeback tables to partitions. Because the writeback tables have no aggregations until they are converted into a partition, querying write-enabled cubes becomes slower over time as the amount of data in the writeback tables increases. Periodically convert the writeback tables to partitions to increase query performance. For more information, see "Converting Writeback Data to a Partition" in SQL Server Books Online.

Submit changes in a single transaction. Because the update of a cell value on a client flushes the query results cache on both the server and the clients, you can increase overall query responsiveness by accumulating multiple changes in a single transaction.

Avoid Using Parent-Child Dimensions With a ROLAP Partition

When you use a relational engine other than SQL Server to store a ROLAP partition, and the cube contains a parent-child dimension, an IN clause with multiple expressions is generated. This SQL statement can become so big that it generates an out-of-memory error. If the parent-child dimension is stored in a ROLAP partition on SQL Server, Analysis Services generates temporary tables to improve performance. For best performance, store all partitions using MOLAP when using parent-child dimensions.

Top of pageTop of page

Optimizing Analysis Services to Improve Processing Performance

Processing a cube updates the entire cube with new data from the underlying relational database, sorts the data for multidimensional access, constructs multidimensional mapping files to speed data retrieval, and creates aggregations of the fact data. Each dimension and partition in a cube can also be processed separately, which updates only the particular dimension or partition.

There are many factors that directly affect the time required to process a cube or partition. These include:

Aggregation design

Storage mode of the partitions

Use of multiple partitions

Number of joins used to retrieve data from the relational database

Use of incremental updates

Performance of the relational database

In addition to these factors, which are discussed below, the design of the data model itself (such as the use of distinct count measures), the availability and configuration of hardware resources on the Analysis server for processing, and the architecture of the Analysis Services solution also affect processing performance. These effects are discussed in separate sections of this paper.

Avoid Designing Excessive Aggregations

Designing excessive aggregations for one or more partitions in your cube is one of the common causes of poor processing performance. More aggregations take longer to process than fewer aggregations. Excessive aggregations can take an extraordinarily long time to process and may require the use of temporary files (which are slow and I/O intensive). Long processing time for full processing or cube refreshing affects cube availability, and long processing time for incremental processing competes with queries for hardware resources. You can use different aggregation designs on different partitions to increase the number of aggregations on frequently queried partitions and decrease the number of aggregations on infrequently queried partitions. For more information, see "Use Multiple Partitions to Improve Processing Performance" later in this section.

To minimize the length of time required to calculate aggregations during processing, only design aggregations that are actually used to resolve queries. For information on designing effective aggregations and avoiding the processing performance penalty associated with excessive aggregations, see "Design Effective Aggregations" earlier in this paper.

Choose MOLAP to Improve Processing Performance

Analysis Services can store data, create multidimensional mapping files, and calculate aggregations faster than a relational engine can create indexes and calculate aggregations. The primary reason the relational engine is slower is due to the large table joins that the relational engine must perform during the processing of a ROLAP partition. In addition, because the relational engine performs the actual processing tasks, competing demands for resources on the computer hosting the relational tables can negatively affect processing performance for a ROLAP partition.

There is no significant difference in processing performance between MOLAP partitions and HOLAP partitions. In both cases, all fact-level data is read from the relational database, and aggregations are calculated. With MOLAP, Analysis Services writes the fact-level data into the MOLAP structure. With HOLAP, Analysis Services does not store fact-level data. This difference has minimal impact on processing performance, but can have a significant impact on query performance. Because HOLAP and MOLAP processing speeds are approximately the same and MOLAP query performance is superior, MOLAP is the optimum storage choice.

If you choose to deploy a real-time ROLAP partition, you should make the ROLAP partition as small as possible and use MOLAP for all other partitions in the cube. This configuration will maximize query responsiveness and minimize processing time.

Use Multiple Partitions to Improve Processing Performance

You can significantly reduce the overall processing time for medium and large cubes by dividing the cube into multiple partitions. Using multiple partitions reduces total processing time for the following reasons:

Multiple partitions can be processed in parallel. You can decrease the total time required to process a cube by processing multiple partitions in parallel (provided you have sufficient processor and memory resources). By default, Analysis Services processes each partition in a cube serially. Parallel processing results in dramatic performance benefits with large cubes during the initial load of the data warehouse, during full cube processing, and during cube refreshes, provided that the Analysis server has sufficient memory to process several partitions in parallel. If the Analysis server has insufficient memory to store the aggregations for each partition being processed, Analysis Services uses temporary files, which negates the performance benefit you are trying to achieve through the use of parallel processing. For more information about configuring memory, see "Configure Available Memory Effectively" later in this paper. For more information about sufficient processor resources to support parallel processing, see "Use Available Processor Resources Effectively," later in this paper. To process partitions in parallel, you must use a tool that calls the Decision Support Objects (DSO) interface. For an example, see the Parallel Processing sample application in the SQL Server 2000 Resource Kit.

Only some partitions will need to be processed. You can process only the partitions that have been updated with new data. This enables you to decrease the overall time required to update a cube with new data. For example, if you partition your cube by time, you merely need to process the partitions containing data from the most recent time period, rather than processing the entire cube. Analysis Services can process one or several small partitions more quickly than it can process a single large partition containing all of the data for the cube.

Different partitions can have different aggregation designs. You can design additional aggregations on heavily queried partitions and design fewer aggregations on less heavily queried partitions. If you keep the number of the partitions with a large number of aggregations small, and design fewer aggregations for those partitions that are queried less frequently, you can reduce overall processing time. However, if you plan to merge partitions at a later date, you must ensure that the partitions being merged have the same aggregation design at the time of the merger.

Different partitions can have different storage modes. You can use different storage modes for particular purposes. For example, you can create a small partition using ROLAP to enable you to implement real-time OLAP, and then use MOLAP for all other partitions to maximize query responsiveness.

Partitions can be refreshed individually. You can refresh a partition more quickly than an entire cube, thereby consuming fewer resources and affecting fewer users. When a partition is incrementally updated, a temporary partition is created and then merged into the existing partition. This can result in data fragmentation, which is similar to disk or index fragmentation. As a result, you can occasionally do a refresh of a partition to enable Analysis Services to re-sort the data for faster multidimensional access, create better multidimensional mapping files, and make smaller aggregations.

In addition to saving overall processing time, partitioning can also significantly improve query responsiveness. For more information, see "Use Multiple Partitions to Improve Query Responsiveness" later in this paper.

Eliminate Joins between Fact and Dimension Tables

When Analysis Services processes a MOLAP or HOLAP partition, it issues SQL queries to the relational engine to retrieve fact and dimension data from the underlying relational tables. If the partition is stored using MOLAP, Analysis Services organizes and saves the retrieved data into multidimensional data files and then calculates and stores aggregations on the Analysis server. If the partition is stored using HOLAP, the retrieved data is only used for calculating and then and storing aggregations on the Analysis server. When Analysis Services processes a ROLAP partition, it issues SQL queries to the relational engine that cause the relational engine itself to calculate and then store the aggregations in the relational database. With a HOLAP or ROLAP partition, Analysis Services also issues queries to the relational engine to resolve some or all user queries.

Regardless of the storage mode, Analysis Services constructs the queries that retrieve fact and dimension data from the relational tables by using the dimension member keys identified in the Member Key Column property for each dimension. By default, the dimension member key defined in the Member Key Column property is a column in the dimension table for each dimension in the cube. As a result, in a complex cube with many dimensions, the query submitted against the relational tables for each partition can be a very large, multi-table join. These types of complex joins can take a long time to execute.

Analysis Services uses this method of constructing the SQL query to guarantee that Analysis Services generates the safest possible query, which is not the most efficient query. Certain prerequisites must be satisfied before you can safely use more efficient queries. For example, Analysis Services constructs the following SELECT statement when processing the first partition of the Warehouse cube in the FoodMart 2000 database:

SELECT
   store.store_id,
   time_by_day.quarter,
   time_by_day.month_of_year,
   warehouse.warehouse_id,
   product.product_id,
   inventory_fact_1997.store_invoice,
   inventory_fact_1997.supply_time,
   inventory_fact_1997.warehouse_cost,
   inventory_fact_1997.warehouse_sales,
   inventory_fact_1997.units_shipped,
   inventory_fact_1997.units_ordered,
   inventory_fact_1997.warehouse_sales-inventory_fact_1997.warehouse_cost
FROM
   inventory_fact_1997,
   time_by_day,
   store,
   warehouse,
   product
WHERE
   (time_by_day.the_year=?) AND              clause added by data slice
   (inventory_fact_1997.time_id=time_by_day.time_id) AND
   (inventory_fact_1997.store_id=store.store_id) AND
   (inventory_fact_1997.warehouse_id=warehouse.warehouse_id) AND
   (inventory_fact_1997.product_id=product.product_id)

This SELECT statement joins four dimension tables to the fact table. With complex cubes containing many more dimensions, the number of joins increases significantly. However, if a dimension member's foreign key in the fact table can be used rather than the dimension member key in the dimension table, much simpler queries can be generated that can speed the retrieval of data from the relational tables. A dimension member's foreign key in the fact table can be used in the processing query if the dimension meets the following criteria:

The dimension is shared and has been processed.

The member key column for the lowest level of the dimension contains the keys that relate the fact table and the dimension table, and is the only key column necessary to relate the fact table to the dimension table.

The keys in the member key column for the lowest level of the dimension are unique within the dimension. (Use a surrogate key to ensure this.)

The lowest level of the dimension is represented in the cube (it does not need to be visible).

If these criteria are met, you can use the Optimize Schema option in Analysis Manager to simplify the queries issued against the relational tables and increase their execution speed. In the case of the Warehouse cube in the FoodMart 2000 database, the Store, Product, and Customer dimensions meet these criteria and, as a result, three of the dimension table joins can be eliminated by using the Optimize Schema option. The resulting SELECT statement is:

SELECT
   inventory_fact_1997.store_id,
   time_by_day.quarter,
   time_by_day.month_of_year,
   inventory_fact_1997.warehouse_id,
   inventory_fact_1997.product_id,
   inventory_fact_1997.store_invoice,
   inventory_fact_1997.supply_time,
   inventory_fact_1997.warehouse_cost,
   inventory_fact_1997.warehouse_sales,
   inventory_fact_1997.units_shipped,
   inventory_fact_1997.units_ordered,
   inventory_fact_1997.warehouse_sales-inventory_fact_1997.warehouse_cost
FROM
   inventory_fact_1997,
   time_by_day                                join added by data slice
WHERE
   (time_by_day.the_year=?) AND                 clause added by data slice
   (inventory_fact_1997.time_id=time_by_day.time_id)

This query uses only one join on a dimension table and will execute must faster than the original query. The other join is related to the partition slice setting and is discussed below.

When Analysis Services uses an INNER JOIN between the fact table and dimension tables, fact-level data without a corresponding dimension table foreign key is eliminated from the result set (if foreign key constraints are used on the fact table, these rows will never be entered in the first place). When you eliminate the INNER JOIN, Analysis Services reads the entire fact table, not merely those rows represented in the dimension tables. As a consequence, you may discover during the process of attempting to eliminate joins that your fact table contains referential integrity violations that were masked by the INNER JOIN. Best practice dictates that you fix the bad data in the fact table or add an "unknown member" to the dimension tables. You can then map facts without associated dimensions to this "unknown member." Unless you do so, facts without associated dimensions are reported as processing errors rather than disappearing as they did previously. Once you eliminate the referential integrity violations, you can use the Optimize Schema option to significantly increase processing performance, regardless of the storage mode. If you cannot clean all of the dimensions, you can selectively eliminate joins by manually pointing the Member Key Column property to the fact table in Cube Editor for those dimensions that you know are clean.

There are a couple of additional steps you can take to eliminate joins that cannot be immediately eliminated by the Optimize Schema option. It is helpful to view the query Analysis Services submits to the relational engine when it processes a partition. To do this quickly without actually executing the query, add "1 = 0" to the partition's WHERE clause using the Partition Wizard (on the Advanced Settings tab) and then process the partition. With this new WHERE clause, the query returns from the relational database with no rows. This enables you to view the partition population query in the partition processing window or by capturing it with SQL Profiler (or an analogous tool) on the relational database. When you are finished optimizing the schema, return the partition's WHERE clause to its original value.

Reviewing the SQL query may enable you to identify the reason for any remaining joins in the SQL statement. Two common reasons why joins may remain are discussed below:

The lowest level in the dimension is not represented in the fact table or is not marked as unique.

In the dimension editor for each dimension, make sure that the member key on the lowest level in the dimension is set to the key (generally an integer key) used for the join between the dimension table and the fact table. For some dimensions, you may need to add a level to the dimension and mark this lowest level as MemberKeysUnique = True (the dimension members must be unique). If you add a level solely to eliminate joins, you can hide the newly added dimension level. For example, if your Time dimension uses Month as the lowest level and the fact table has a daily grain, you can add the day level to the Time dimension (this assumes that the underlying relational table actually contains day-level data) and then hide the day level from users. This will enable you to optimize the schema and eliminate the join to the Time dimension table.

The join was caused by a partition slice.

The use of partitions introduces a join clause (see the SELECT statements for the FoodMart 2000 warehouse previously discussed). The join clause comes from the WHERE clause generated to enforce the slice setting. The WHERE clause ensures that only data meeting the slice criteria arrives in the cube partition.

The WHERE clause is not needed if each partition comes from a separate relational table containing only data for that slice. To suppress the automatic generation of a WHERE clause for all partitions on the Analysis server, modify the DataCompressionSettings registry setting and add the hex value of 0x00100000 to the existing value for this key. This registry change suppresses the generation of the WHERE clause for all partition slices. The setting applies to the entire server and is read at server startup.

Note: Do not set this option unless you can guarantee that each partition table contains only data for that partition. Remember that the SQL query returns the entire fact table. A common practice is to partition the fact table in the relational database and use a single, separate fact table for each partition. If you eliminate the WHERE clause, which eliminates the join, and you have not properly partitioned the fact table in the relational database, you will receive no warnings. However, you will have corrupted data, including the possibility of double counted or missed rows.

Caution: Because corrupt data can result if the fact table is not partitioned properly and because this is a server-wide setting, modifying the DataCompressionSettings registry key is not generally recommended.

These steps will eliminate two common reasons for remaining joins. Different designs may yield additional explanations for the remaining joins. Each join has its own reason and you may need to be creative to understand why the join exists and how you can eliminate it. For example, you may have joins required for dimensions that really should be virtual dimensions. If you had a dimension called DayOfWeek, you may be using a join to the Time dimension table to look up its value (Mon, Tues, Wed, Thurs, Fri, Sat or Sun). If you converted this DayOfWeek physical dimension to a virtual dimension, based on a member property at the day level, you could eliminate this join.

Once you have eliminated the reasons for the remaining joins, to the extent possible, use the Optimize Schema option to eliminate additional joins. You must process a newly added dimension before you can use it to eliminate a join. If the MemberKeyColumn property of any of these levels still references a dimension table column, you can manually change this property to include the fact table column that joins to the dimension. When you are finished, remember to return the partition's WHERE clause to its original value.

You do not need to repeat this process for each partition in the cube, because the cube structure is defined at the cube level, each partition is a clone of that structure, and all partition fact tables are identical in structure.

Use Incremental Processing and Changing Dimensions

New data and changes to dimensions require dimensions and partitions to be processed. Some changes require full processing, while some changes can be incorporated using incremental processing. For example, changes to the structure of a standard dimension require Analysis Services to fully process the dimension to incorporate structural changes. When a standard dimension is fully processed, no cube that uses the newly processed dimension can be queried until that cube is also fully processed. As a result, a structural change to one dimension can have a ripple effect on multiple cubes, making all of them unable to be queried ("knocking them offline") until the cubes have been fully reprocessed. Given these dependencies, full dimension processing of standard dimensions should be avoided whenever possible. Defining a dimension as "changing" can minimize the frequency with which the dimension, and the cubes that use it, needs to be fully processed when relocating members in the hierarchy.

Incremental Processing

Analysis Services can add new data and incorporate changes that do not affect the structure of a standard dimension by using incremental processing. Examples of structural changes that do not require full processing include adding new members, changing member properties, or changing custom member options.

Incremental Dimension Processing

When a standard dimension is incrementally processed, Analysis Services adds the new dimension information but does not modify cube data or aggregations. While incremental processing of a dimension is occurring, users can contin