|
Subhashini (Moderator):
Hi
all, a very good evening to all of you !
Subhashini (Moderator):
We
shall begin the chat in the next 5 minutes
Subhashini (Moderator):
Welcome to today's chat on SQL Server 2005 partitioning by Vinod M
Kumar & Srinivas Sampath
Subhashini (Moderator):
Vinod Kumar did his Bachelors of Engineering in Mechanical from Anna
University, Chennai. After graduation, he joined SCT Software
Solutions (then Exeter Systems) where he worked for about 4 years.
He has worked for a team called Technology Labs which evaluates
various Microsoft technologies and products and their relevance in
this line of business. He is presently with Intel Technologies,
Bangalore. He has been working on Microsoft Technologies all along
his professional career and is very inclined towards learning every
new concept that Microsoft comes out with. In this process, he has
gathered tons of knowledge to share with all.
Catch all the resources and articles Vinod has written at
www.ExtremeExperts.com
<
http://www.ExtremeExperts.com >, the site he co-hosts. He has
been very passionate about SQL Server programming and loves to
experiment different programming tricks in SQL Server.
Vinod is an active volunteer in INETA and PASS. He has initiated a
SQL Server community in Bangalore. To capture more ev
Subhashini (Moderator):
To
capture more events on SQL Server, join the group at <
http://groups.msn.com/SQLBang > which was formed as an
initiative to bring PASS community in India.
Email ID: vinodk@mvps.org
< mailto:vinodk@mvps.org >
Web
Site: www.ExtremeExperts.com
<
http://www.ExtremeExperts.com >
Subhashini (Moderator):
Srinivas Sampath works for SCT Software Solutions(Bangalore). He
graduated with a bachelors in Computer Technology(PSG College of
Technology) and later pursued MCA (Madras University).He joined SCT
(then called Exeter Systems) in 1997 and has been with them for the
past 6 years. His area of expertise include database architectures
(using SQL Server) and enterprise application architectures using
Microsoft Technologies. Srinivas currently is a Product Development
Manager and also head of Technology Labs, a group that focuses on
latest Microsoft technologies and their applicability into their
product lines. His interests include SQL Server 2000 programming,
Office 2003 programming and .NET.
ssampath@sct.co.in <
mailto:ssampath@sct.co.in >
<
http://www32.brinkster.com/srisamp >
Subhashini (Moderator):
Just a few chat rules before we begin the chat
Subhashini (Moderator):
Chat Procedures:
This chat will last for one hour. During this hour, our Experts will
respond to as many questions as they can. Please understand that
there may be some questions we cannot respond to due to lack of
information or because the information is not yet public. We
encourage you to submit questions for our Experts
Subhashini (Moderator):
We
ask that you stay on topic for the duration of the chat. This helps
the Guests and Experts follow the conversation more easily. We
invite you to ask off topic questions after this chat is over.
Subhashini (Moderator):
Please use the check box " ask the experts" to ask your questions to
the experts
Subhashini (Moderator):
So
lets get the chat rolling ..
Subhashini (Moderator):
Hi
Vinod & Srini
SriSamp [MVP] (Expert):
Welcome all.
Vinod_MVP (Expert):
Hi
all ... Good Evening ...
SriSamp [MVP] (Expert):
Welcome to this chat on the new partitioning features of SQL Server
2005.
SriSamp [MVP] (Expert):
The
concept of Partitioning of data is not new to SQL Server 2005.
Methods were available in SQL Server 2000 (and earlier) also for
partitioning of data.
SriSamp [MVP] (Expert):
What is new however, is the mechanism of partitioning and what all
you can do with it.
SriSamp [MVP] (Expert):
Before we actually get into the details of partitioning in SQL
Server 2005, let us understand some of the basics of partitioning
and what methods were available in earlier versions.
SriSamp [MVP] (Expert):
The
first question that people have is: What are partitions and why
would someone want to use them?
SriSamp [MVP] (Expert):
Many companies today have data warehouses which contain historical
data and these warehouses keep growing on a regular basis as more
data comes in.
SriSamp [MVP] (Expert):
The
basic tenet behind a warehouse is that given an entity (likes
sales), out of the millions of rows that are archived in the
warehouse, only a portion of it is required at any one time.
SriSamp [MVP] (Expert):
For
example, managers may be interested in the current quater and the
previous two quaters. The rest of the data is always there when
needed, but the access patterns are usually known.
SriSamp [MVP] (Expert):
If
all the data for an entity were stored in a single monolithic table,
we immediately run into issues of maintenance, availability and
performance.
SriSamp [MVP] (Expert):
Any
operations that we do with the table will affect others who are
using that table for ad-hoc reporting, analysis etc.
SriSamp [MVP] (Expert):
To
understand this better, consider for example, a case wherein the
sales data for an organization is stored in a single table.
SriSamp [MVP] (Expert):
If
the administrator decides to do some indexing operations on the
table or wants to load in new data, the entire table is locked out
during these operations.
SriSamp [MVP] (Expert):
In
addition to size, a table with varying access patterns might be a
concern for performance and availability when there are different
access patterns.
SriSamp [MVP] (Expert):
For
example, consider a case wherein a sales table has the current
month's data as read-write and the previous month's data as
read-only.
SriSamp [MVP] (Expert):
In
this case, as data moves in and out of the table, the access to the
table and its ability to respond to user requests might be impacted.
SriSamp [MVP] (Expert):
In
some cases, periodic downtime is acceptable, yet it can often be
avoided or minimized by better design and implementations.
SriSamp [MVP] (Expert):
Given that the access patterns of data access in warehouse tables
are usually known (last quater only, last two years only etc)
SriSamp [MVP] (Expert):
Would it not be great if we could have the data separated on these
access points? That's what partitions are all about.
SriSamp [MVP] (Expert):
Partitions are basically useful when you are dealing with large
tables, since it helps you partition the data into more manageable
chunks based on well known access points.
SriSamp [MVP] (Expert):
Partitioning a large table can provide lots of benefits especially
in the area of scalability and manageability.
SriSamp [MVP] (Expert):
Before we understand how to approach this new topic in SQL Server
2005, let's see what options we had in earlier versions of SQL
Server.
SriSamp [MVP] (Expert):
In
versions of SQL Server prior to 7.0, partitioning had to be part of
the design and then built into all the data access code of
applications.
SriSamp [MVP] (Expert):
The
typical mechansim was to create multiple tables and then based on
the partitioning column, the data access component might route the
data to the appropriate table.
SriSamp [MVP] (Expert):
A
view would be typically be created by UNIONing all of the tables and
presenting a unified table. However, data inserts, updates etc had
to happen on the appropriate table.
SriSamp [MVP] (Expert):
To
understand this concept, consider the following fictitious table
which stores details for customers:
SriSamp [MVP] (Expert):
CREATE TABLE MyCustomers
(
CustomerID INT NOT NULL PRIMARY KEY,
CustomerName VARCHAR(100),
CustomerState VARCHAR(2),
<< other columns of interest >>
)
SriSamp [MVP] (Expert):
Let
us assume that this table will have several thousands of customers
and thus, we have decided to partition this table along the "state"
to which the customer belongs.
SriSamp [MVP] (Expert):
Thus, you might have tables like: MyMACustomers, MyCACustomers etc
and each table would essentially have the same columns, but only
storing data for that particular state.
SriSamp [MVP] (Expert):
We
can control the data that is stored into a particular table by
implementing CHECK constraints on the "CustomerState" column.
SriSamp [MVP] (Expert):
Once we have this basic structure in place, we can then create a
view called MyCustomers that would aggregate all these individual
tables like this:
SriSamp [MVP] (Expert):
CREATE VIEW MyCustomers AS
SELECT .... FROM MyMACustomers
UNION ALL
SELECT ... FROM MyCACustomers
UNION ALL
....
SriSamp [MVP] (Expert):
Given this design, as discussed earlier, in versions of SQL Server
prior to 7.0, the application code can use the view to select data,
however, it had to route DMLs.
SriSamp [MVP] (Expert):
For
example, if we need to insert data into the customers table, a
typical code block would be something like this:
SriSamp [MVP] (Expert):
IF
(CustomerState = 'Boston')
EXEC MyBostonCustomersInsert <parameters>
....
SriSamp [MVP] (Expert):
This is quite cumbersome to maintain and as new states get added, we
need to modify all relevant code appropriately.
SriSamp [MVP] (Expert):
Another issue was with respect to performance. For example, if we
needed to search only for customers in the MA state, SQL Server
still had to query all of the underlying tables.
SriSamp [MVP] (Expert):
If
only a limited subset of the underlying tables was needed, then each
user and developer needed to know the design in order to reference
only the appropriate tables.
SriSamp [MVP] (Expert):
This issue was to a great extent solved in SQL Server 7.0.
SriSamp [MVP] (Expert):
With the release of 7.0, views were combined with constraints to
allow the query optimizer to remove irrelevant tables from the
querey plan.
SriSamp [MVP] (Expert):
This basically reduced the overall plan cost when a UNIONed view
accessed multiple tables.
SriSamp [MVP] (Expert):
Thus for the example situation above, if we needed only the
customers in the MA state, the users would only be directed to the
MyMACustomers when they queried the MyCustomers view.
SriSamp [MVP] (Expert):
How
did SQL Server 7.0 achieve this?
SriSamp [MVP] (Expert):
As
long as the constraints are "trusted" and the queries against the
view use a WHERE clause that uses the partition key, SQL Server will
access only the required tables.
SriSamp [MVP] (Expert):
So
what are trusted constraints? A trusted constraint is one wherein
SQL Server can guarantee that all data adheres to the properties
defined by the constraint.
SriSamp [MVP] (Expert):
By
using this mechansim, developers can significantly reduce the
complexity of their design by using views without having to directly
access the table of interest.
SriSamp [MVP] (Expert):
With trusted constraints SQL Server improves the performance by
removing unnecessary tables from the execution plan.
SriSamp [MVP] (Expert):
Although SQL Server 7.0 solved the issue of performance during
querying, DML operations like INSERT, UPDATE and DELETE still had to
be made on the underlying tables and not the view.
SriSamp [MVP] (Expert):
This means that the application code still had to use patterns like
the ones described above to route the data to the appropriate
tables.
SriSamp [MVP] (Expert):
Enter SQL Server 2000. In SQL Server 2000, data modification
statements also benefit from the partition capabilities introduced
in SQL Server 7.0.
SriSamp [MVP] (Expert):
In
SQL Server 2000, you could fire DML statements against the view and
SQL Server would determine the appropriate underlying table that
would get affected.
SriSamp [MVP] (Expert):
For
a good treatment of how to design SQL Server 2000 Data Warehouses
and thus partitions, refer to:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/partitionsindw.asp
SriSamp [MVP] (Expert):
It
is also useful to note that SQL Server 2000 supported two types of
partitioned views: Local Partitioned View (LPV) and Distributed
Partition View (DPV)
SriSamp [MVP] (Expert):
An
LPV is one in which all of the underlying tables for the partition
view lie in the same SQL Server instance.
SriSamp [MVP] (Expert):
A
DPV is one in which the uderlying tables live on separate instances.
These instances don't have to be on different machines, but they
usually are.
SriSamp [MVP] (Expert):
Spreading a DPV across multiple machines helps "scale out" large SQL
Server implementations (something called FEDERATION).
SriSamp [MVP] (Expert):
For
more information on how to setup and use DPVs and the federation
capability, refer to the SQL Server 2000 books online.
SriSamp [MVP] (Expert):
While the improvements in SQL Server 7.0 and SQL Server 2000
significantly enhanced performance when using partitioned views,
they did not simply the following:
SriSamp [MVP] (Expert):
Administration, Design, Development.
SriSamp [MVP] (Expert):
People still had to create the underlying tables for the parition
view manually and maintain them.
SriSamp [MVP] (Expert):
Administration is also complex, since there are numerous tables to
manage and data integrity constraints must be managed for each
table.
SriSamp [MVP] (Expert):
From a performance standpoint, the optimizer was required to
validate and determine plans for each partition. Thus query
optimization cost increases linearly with the number of partitions.
SriSamp [MVP] (Expert):
Enter SQL Server 2005.
SriSamp [MVP] (Expert):
In
SQL Server 2005, In SQL Server 2005, table and index pages are
contained in one or more partitions.
SriSamp [MVP] (Expert):
A
partition is a user-defined unit of data organization.
SriSamp [MVP] (Expert):
By
default, a table or index has only one partition that contains all
the table or index pages.
SriSamp [MVP] (Expert):
The
partition resides in a single filegroup.
SriSamp [MVP] (Expert):
A
table or index with a single partition is equivalent to the
organizational structure of tables and indexes in earlier versions
of SQL Server.
SriSamp [MVP] (Expert):
When a table or index uses multiple partitions, the data is
partitioned horizontally so that groups of rows are mapped into
individual partitions, based on a specified column.
SriSamp [MVP] (Expert):
The
partitions can be put on one or more filegroups in the database.
SriSamp [MVP] (Expert):
The
table or index is treated as a single logical entity when queries or
updates are performed on the data.
SriSamp [MVP] (Expert):
SQL
Server 2005 introduces some new concepts and terminologies when you
are dealing with partitions.
SriSamp [MVP] (Expert):
The
steps for creating a partitioned table or index include the
following:
SriSamp [MVP] (Expert):
a.
Create a partition function to specify how a table or index that
uses the function can be partitioned.
SriSamp [MVP] (Expert):
b.
Create a partition scheme to specify the placement of the partitions
of a partition function on filegroups.
SriSamp [MVP] (Expert):
c.
Create a table or index using the partition scheme.
SriSamp [MVP] (Expert):
Rather than boring you with theory of what and how to define
functions, schemes etc, let us see an example in action that you can
paste and execute in SQL Server 2005.
SriSamp [MVP] (Expert):
Note that these commands will work only with the Enterprise /
Developer Edition of SQL Server 2005, since partitions is only
supported on these editions.
SriSamp [MVP] (Expert):
As
we see each example, I will explain the syntax and what the command
is doing.
SriSamp [MVP] (Expert):
Note that we will use the same example as above, wherein we want to
store details of customers in various states in separate partitions.
SriSamp [MVP] (Expert):
We
will start off by creating the "partition function"
SriSamp [MVP] (Expert):
CREATE PARTITION FUNCTION customer_state (CHAR) AS
RANGE LEFT FOR VALUES ('MA', 'CA', 'AZ')
GO
SriSamp [MVP] (Expert):
A
partition function specifies how the table or index is partitioned.
SriSamp [MVP] (Expert):
To
create a partition function, you specify the number of partitions,
the partitioning column, and the range of partition column values
for each partition.
SriSamp [MVP] (Expert):
Note that you can specify only one partitioning column.
SriSamp [MVP] (Expert):
Given the above definition, we will get 4 partitions:
SriSamp [MVP] (Expert):
Partition 1 = Values <= MA, Partition 2 = Values > MA and <= CA,
Partition 3 = Values > CA and <= AZ and Partition 4 = Values > AZ
SriSamp [MVP] (Expert):
Note that SQL Server would give you a warning for the above
Partition Function defintion that the values need to be sorted. To
correct this, provide the values in the sorted order. Thus, we would
use AZ, CA and MA in that order.
SriSamp [MVP] (Expert):
Ok,
now that we have created the partition function, let us test it with
some values using the following query:
SriSamp [MVP] (Expert):
SELECT $PARTITION.customer_state ('BA')
SriSamp [MVP] (Expert):
The
"$PARTITION" function returns the partition number into which a set
of partitioning column values would be mapped for any specified
partition function.
SriSamp [MVP] (Expert):
Q:
WHY
DO YOU NEED LEFT THERE?
A: Good question. For our example it not needed. However,
down the line, I will use it to demonstrate how values inbetween
them are moved into the various partitions.
SriSamp [MVP] (Expert):
The
above statement will return 2, indicating that when we insert 'BA'
as the value, the record would get into the second partition.
SriSamp [MVP] (Expert):
You
can also query some meta data views to get information about the
partition function and its boundary values. Here is an example:
SriSamp [MVP] (Expert):
SELECT
pf.name, pf.type_desc, pf.fanout, rv.*
FROM
sys.partition_functions pf
INNER JOIN sys.partition_range_values rv ON
pf.function_id = rv.function_id
WHERE
pf.name = 'customer_state'
SriSamp [MVP] (Expert):
The
above query will return the partition function name, its type, the
number of partitions created and range values (that we defined
earlier).
SriSamp [MVP] (Expert):
Let
us now create a partition scheme for the above partition function
using the following script:
SriSamp [MVP] (Expert):
CREATE PARTITION SCHEME customer_scheme
AS PARTITION customer_state
ALL TO ([PRIMARY])
GO
SriSamp [MVP] (Expert):
A
partition scheme maps the partitions produced by a partition
function to a set of filegroups that you define.
SriSamp [MVP] (Expert):
Note that we must specify enough number of filegroups to hold the
number of partitions.
SriSamp [MVP] (Expert):
If
we specify more file groups than the number of partitions, SQL
Server will then mark one of the file groups as NEXT USED, meaning
that it will contain the next added partition.
SriSamp [MVP] (Expert):
Note that you can also map all the partitions into a single
filegroup (which is what we have done above).
SriSamp [MVP] (Expert):
Once we have created the scheme, we can now create a table (or an
index) that will use this scheme. Here is our customer table
re-defined:
SriSamp [MVP] (Expert):
IF
(OBJECT_ID ('Customers') IS NOT NULL)
DROP TABLE Customers
GO
CREATE TABLE Customers
(
CustomerID INT NOT NULL,
CustomerName NVARCHAR(50),
StateOfResidence CHAR(2),
TotalOrder DECIMAL (10,2)
CONSTRAINT CustomerPK PRIMARY KEY (CustomerID,
StateOfResidence),
CONSTRAINT CustomerState CHECK (StateOfResidence IN
('AZ', 'CA', 'MA', 'BA', 'CM', 'MZ'))
)
ON
customer_scheme (StateOfResidence)
GO
SriSamp [MVP] (Expert):
Two
things to note here:
SriSamp [MVP] (Expert):
a.
Note that the primary key constraint also includes the partition
key. This is required for SQL Server to enforce uniqueness and
partition verification.
SriSamp [MVP] (Expert):
b.
We include a CHECK constraint in the table exlusively to limit the
values in the column
SriSamp [MVP] (Expert):
We
now insert sample rows into this table. Here is a script that does
the same:
SriSamp [MVP] (Expert):
INSERT INTO Customers SELECT 1, 'Contoso', 'AZ', 2000
INSERT INTO Customers SELECT 2, 'Levis', 'CA', 2000
INSERT INTO Customers SELECT 3, 'ACME Corp', 'MA', 2000
INSERT INTO Customers SELECT 4, 'Fabrikam', 'MA', 2000
INSERT INTO Customers SELECT 5, 'Books Mart', 'AZ', 2000
SriSamp [MVP] (Expert):
Ok,
we know that we have 4 partitions. How do we determine how many rows
each partition got? Here is a query for the same:
SriSamp [MVP] (Expert):
SELECT
$PARTITION.customer_state (StateOfResidence) AS
[Partition],
COUNT(*) AS [#Records]
FROM
Customers
GROUP BY
$PARTITION.customer_state (StateOfResidence)
ORDER BY
[Partition]
SriSamp [MVP] (Expert):
What this query is doing is displaying the number of records in each
partition. Note that we have used the $PARTITION function to get the
number and group on that.
SriSamp [MVP] (Expert):
In
our above sample script, we inserted data using values that were
exactly boundary conditions.
SriSamp [MVP] (Expert):
Let
us now insert some more rows that fall between the boundaries and
see how they are handled. Here is a script for the same:
SriSamp [MVP] (Expert):
INSERT INTO Customers SELECT 6, 'Network Mate', 'BA', 2000
INSERT INTO Customers SELECT 7, 'Soft Pro', 'CM', 2000
INSERT INTO Customers SELECT 8, 'FabMall', 'MZ', 2000
SriSamp [MVP] (Expert):
Now, if you run the above SELECT statement, you see the revised
distribution. If you have a pen and paper with you draw out 4 boxes
and place the values manually based on our boundaries and you will
understand the behavior of the output.
SriSamp [MVP] (Expert):
Here is another query that lists the rows present in a specified
partition.
SriSamp [MVP] (Expert):
SELECT * FROM Customers WHERE
$PARTITION.customer_state (StateOfResidence) = 3
SriSamp [MVP] (Expert):
Till now, what we have defined as the partition is an example of a
RANGE PARTITION.
SriSamp [MVP] (Expert):
Range partitions are partitions that are defined by specific and
customizable ranges of data.
SriSamp [MVP] (Expert):
The
range partition boundaries are chosen by the developer and can be
changed as data usage patterns change.
SriSamp [MVP] (Expert):
Note that apart from partitioning tables, you can also partition
indexes.
SriSamp [MVP] (Expert):
Partitioning a table and its indexes along the same partition
function often optimizes performance.
SriSamp [MVP] (Expert):
When the indexes and the table use the same partioning function and
columns in the same order, the index and the table are said to be
"aligned".
SriSamp [MVP] (Expert):
When the tables and indexes also use the same partition scheme, they
are considered to be "storage aligned".
SriSamp [MVP] (Expert):
One
benefit of storage alignment is that all the data and indexes are
present in the same filegroup which can be maintained independently,
thus increasing availability.
SriSamp [MVP] (Expert):
Note that although we have defined our boundary values earlier, it
is often the case that the boundary values can change over time.
SriSamp [MVP] (Expert):
This requires us to do some maintenance over our partitions and
there are several new features and concepts for the same.
SriSamp [MVP] (Expert):
Some of the new features are SPLIT, MERGE and SWITCH which we will
cover next:
SriSamp [MVP] (Expert):
Let's answer some questions and if there are more, we can take them
too. Otherwise, we will move ahead with the more advanced features.
Vinod_MVP (Expert):
Q:
please provide me links for a new starter into SQL world !
A: There are some handy links available at:
http://www.microsoft.com/sql/default.mspx which you can
goto. But from a starter point of view I would strongly recommend
using the SQL Server Books Online as the place to start. It comes
part of SQL Server Installation.
Vinod_MVP (Expert):
Q:
Is
there a difference if we use express edition?
A: The concept of partitioning is same across all the
editions. Nothing changes. But you need to understand that
Partitioning is supported in the Enterprise and Developer Editions
ONLY. This feature is not available in the Express Edition.
Vinod_MVP (Expert):
Q:
WHY
DO YOU NEED LEFT THERE?
A: As Srini said, the Left tries to say which side of the
partition should the values on the boundary fall. A left says the
value in the boundary falls on the left side. As a best practice:
Use RANGE RIGHT for ???continuous??? data values ??? more intuitive.
Hence boundary becomes starting point for each range. But for
character based partitioning like the example Srini shows it is
based on your data distribution you might want to use one over the
other. Hope this clarifies further.
SriSamp [MVP] (Expert):
Any
other questions?
Vinod_MVP (Expert):
Ok.
Let us get rolling if there is no further questions.
Vinod_MVP (Expert):
Well, now that Srini has laid some basic foundation to the whole
partitioning concept lets dwel a step deeper in what it is to offer
in addition to this.
Vinod_MVP (Expert):
Because partitions are used for large tables that need to scale to
support better throughput, it is possible that the number of
partitions chosen when the partition function was created will need
to change over time.
Vinod_MVP (Expert):
Using the ALTER TABLE statement with the new ???split??? option, you
can add another partition to the table.
Vinod_MVP (Expert):
When a partition is ???split,??? data may be moved to the new
partition; however, for best performance rows should not move.
Vinod_MVP (Expert):
Conversely, if you want to remove a partition you would perform a
switch and merge functionalities.
Vinod_MVP (Expert):
In
the case of range partitions, a merge request is made by stating
which boundary point should be removed.
Vinod_MVP (Expert):
For
example, you might choose to have one year of data available and
each month-end you will switch in the current month and then switch
out the earliest month thereby differentiating between the current
month's read/write OLTP data and the ealier month???s read-only
analysis type of data.
Vinod_MVP (Expert):
However, before you merge a boundary point you should switch out its
associated data otherwise the merge could be an expensive operation.
Vinod_MVP (Expert):
There is a specific flow to this process to make it the most
efficient. The concept of switch, merge and split is somewhat
complex at first glance nevertheless once we start playing around
with them it gets really easy.
Vinod_MVP (Expert):
Let
us take simple examples of each of these operations and the nuances
of each. Let us start with SPLIT:
Vinod_MVP (Expert):
Syntax:
Vinod_MVP (Expert):
ALTER PARTITION FUNCTION ??? SPLIT RANGE ???
Vinod_MVP (Expert):
This adds a new boundary point.
Vinod_MVP (Expert):
The
performance caveat is whether rows need to be moved to the new
partition. If rows are moved, an exclusive table lock applied until
the rows are moved.
Vinod_MVP (Expert):
Affects all objects using that Partition Function. One partition is
split into two.
Vinod_MVP (Expert):
The
new partition is the one containing the new boundary point:
1.
To the right of boundary if RANGE RIGHT
2.
To the left of boundary if RANGE LEFT
Vinod_MVP (Expert):
Some considerations for SPLIT:
Vinod_MVP (Expert):
1.
Instantaneous if partition is empty
Vinod_MVP (Expert):
2.
This is IO-intensive if partition is populated
a. Any data on new side of the boundary is
physically deleted from old partition and inserted into the new
partition
b. Fully logged operation
c. Exclusive table lock held for duration of Split
Vinod_MVP (Expert):
3.
Schema change, so plans are invalidated
Vinod_MVP (Expert):
4.
Always assign ???Next Used??? filegroup in Partition Scheme before
using Split
ALTER PARTITION SCHEME NEXT USED ???
Vinod_MVP (Expert):
that code specifies which filegroup holds new partition???s data
Vinod_MVP (Expert):
EG:
ALTER PARTITION FUNCTION TransactionRangePF1()
SPLIT RANGE ('01/01/2005')
Vinod_MVP (Expert):
Moving forward, lets next take a look at MERGE:
Vinod_MVP (Expert):
As
the keyword suggests it removes a boundary point from a Partition
Function
Vinod_MVP (Expert):
Affects all objects using that Partition Function
Vinod_MVP (Expert):
Syntax:
ALTER PARTITION FUNCTION ??? MERGE RANGE
Vinod_MVP (Expert):
Considerations for MERGE are same as the SPLIT operations.
Vinod_MVP (Expert):
EG:
ALTER PARTITION FUNCTION TransactionRangePF1()
MERGE RANGE ('01/01/2004')
Vinod_MVP (Expert):
The
next important operation is the SWITCH operation.
Vinod_MVP (Expert):
1.
Instantly swaps the content of one partition or table (source) with
another table???s empty partition or an empty table (target)
Vinod_MVP (Expert):
2.
Metadata-only operation, no data moves
Vinod_MVP (Expert):
3.
Restrictions:
a. Target table or partition must be empty
b. Source and target must be in same filegroup
c. Source must have all indexes required by the
target, aligned and in matching filegroups
d. If Target is a partition, Source must have check
constraints (if table) or a partition range that fits within the
target range
Vinod_MVP (Expert):
4.
All associated indexes are automatically Switched along with the
table / partition
Vinod_MVP (Expert):
5.
If your table and indexes are aligned, you can instantly switch
in/out data
Vinod_MVP (Expert):
Switch can happen between:
1.
Partitioned Table's partition into a Table
2.
Table into a Partitioned Table's partition
3.
Partitioned Table into another Partitioned Table's partition
Vinod_MVP (Expert):
EG:
ALTER TABLE dbo.TransactionHistory
SWITCH Partition 1
TO
dbo.TransactionHistoryStaging PARTITION 1
Vinod_MVP (Expert):
So
there can be interesting scenarios we can have because of
partitioning. But before we go there, do you have any question to
bounce us with?
SriSamp [MVP] (Expert):
Q:
hi....sorry for bugging you guys....is there any repository to get
all the information regarding sqlserver partioning
A: Resources will be posted in the end. So you can get all
the links.
SriSamp [MVP] (Expert):
Q:
can
any body explain the exact difference between horizontal and
vertical partioning
A: Horizontal partitioning is wherein an object is
partitioned along its columns and vertical partitioning is wherein
an object is partitioned along its rows. All the examples that we
have seen till now are "vertical" in nature.
SriSamp [MVP] (Expert):
Oops sorry. Its the other way round.
SriSamp [MVP] (Expert):
Horizontal is along the rows and vertical is along the columns. All
our examples have been "horizontal" in nature.
SriSamp [MVP] (Expert):
Sorry about that.....
Vinod_MVP (Expert):
Q:
can
any body explain the exact difference between horizontal and
vertical partioning
A: The problem with implementing a vertical partitioning is
only using views. The problem is to implement multiple partition
views we need to create that many number of sql views.
SriSamp [MVP] (Expert):
Q:
i
have a doubt in Reporting Services, can i use a parameter derived
from a SQL Dataset in a MDX query
A: Sumit, the question is out of context. Can you drop in a
mail to us and we can then answer?
Vinod_MVP (Expert):
Ok.
With no further questions over Partitioning, let me talk over some
scenario's.
Vinod_MVP (Expert):
1.
Sliding Window Load and Delete
Vinod_MVP (Expert):
This is a typical scenario for Large databases.
Vinod_MVP (Expert):
Partitions might be requested for each hour/day/week/month???add new
partition and remove the oldest one.
Vinod_MVP (Expert):
New
partition ???
a. May need to batch load, scrub, and transform
before incorporating into the whole table
b. Or start partition as empty and populate
gradually using transactions.
Vinod_MVP (Expert):
Old
partition ??? may need backup, archive, restore.
Vinod_MVP (Expert):
Given this scenario what will be the partitioning activity we might
need to do
Vinod_MVP (Expert):
Steps for Sliding Window:
Vinod_MVP (Expert):
1.
Create staging table in same filegroup as target partition
Vinod_MVP (Expert):
2.
Split most recent partition, adding boundary point for following
period
Vinod_MVP (Expert):
3.
Bulk load and index staging table
Vinod_MVP (Expert):
4.
Switch data into next-to-last partition
Vinod_MVP (Expert):
Simple. That was half the job.
Vinod_MVP (Expert):
Deletion step in Sliding window:
Vinod_MVP (Expert):
1.
Create unload table in same filegroup as partition to remove
Vinod_MVP (Expert):
2.
Switch data out of second partition
Vinod_MVP (Expert):
3.
Merge first partition, removing the boundary point for the unload
period
Vinod_MVP (Expert):
4.
Archive or Truncate the unload table
Vinod_MVP (Expert):
Main design point for sliding window is you always reserve an empty
partition at either end. This means switching operations will be
instantaneous for either adding (switching in) or archiving
(switching out)
Vinod_MVP (Expert):
The
idea of doing this as efficiently as possible ??? keep table
available, meta data only e.g. pointer switches.
Vinod_MVP (Expert):
When you build the staging table, it is outside of the partitioned
table. Thus, you have no availability hit while loading the staging
table, and then add indexes to the staging table. Lastly, the
pointer switch is instantaneous.
Vinod_MVP (Expert):
Partitioning has got a new dimension to backup and recovery too
Vinod_MVP (Expert):
Lets take that next
Vinod_MVP (Expert):
a.
Partitioning leverage Filegroup Backup / Restore enhancements in SQL
Server 2005
- Read-Only Filegroups can now be restored without
applying transaction logs
Vinod_MVP (Expert):
b.
Reduces data volume for regular backups when historical data is not
changing
Vinod_MVP (Expert):
Functionally this means, Mark filegroups for unchanging, historical
partitions as READ ONLY. Perform one-time backup of these READ ONLY
filegroups.
Vinod_MVP (Expert):
Regularly backup only the Primary filegroup and other filegroups
containing active, changing data.
Vinod_MVP (Expert):
To
Restore, restore Primary and active filegroups and recover log.
Then restore the read-only filegroups separately.
Vinod_MVP (Expert):
Apart from this it is very important to understand what to be used
as a partition key
Vinod_MVP (Expert):
Consider a Orders Table with Order_Date and Order_ID columns with
other columns.
Vinod_MVP (Expert):
What are the Design alternatives we have:
Vinod_MVP (Expert):
1.
Application enforced uniqueness. Partition on order_date, align on
Order_ID
Vinod_MVP (Expert):
2.
Partition on Order_date. Create a unique monolithic (spanning all
partitions) non-aligned index on Order_ID. Non-aligned unique index
enforces unique Order_ID.
Vinod_MVP (Expert):
3.
Maintain uniqueness and index alignment by design Order_ID to
monotonically increase e.g. identity value. Partition on Order_ID.
Vinod_MVP (Expert):
Choice of one option over the other depends on the workloads like:
Vinod_MVP (Expert):
1.
DataWarehouse probably best to choose option #1. Because that is
what makes sense to them.
Vinod_MVP (Expert):
2.
OLTP probably best to choose option #3
Vinod_MVP (Expert):
3.
Some reporting scenarios (drill down reporting, customer history
questions) may benefit from option #2, the non-aligned index. Pay
price each time a new partition is added, the non-aligned index has
to be rebuilt.
Vinod_MVP (Expert):
There are also Engine enhancements because of Partitioning.
Vinod_MVP (Expert):
If
two tables are partitioned the same way (same scheme), it may be
possible to perform a join of each pair of partitions separately
Vinod_MVP (Expert):
This is called a ???collocated join???
Vinod_MVP (Expert):
Works only for some join conditions like Equi-joins on partitioning
key (A.a=B.b)
Vinod_MVP (Expert):
I
see this as an big time asset. Apart from this many other operations
can be performed ???per-partition??? like:
Vinod_MVP (Expert):
1.
Grouping, Filtering, Projection
2.
Inserts, Updates, Deletes
3.
Create Index, Bulk Insert
Vinod_MVP (Expert):
So
any more questions???
SriSamp [MVP] (Expert):
Q:
What is the difference between DataWarhouse & Data partition?
A: A data warehouse is a comprehensive source of data
(usually collected over several years) and an infrastructure for
building Business Intelligence (BI) solutions. Typically data
warehouses deal with concepts like "facts" and "dimensions" and will
have a schema structure that is more de-normalized to facilitate
querying. Data Partitioning on the other hand is the concept of
splitting data into small chunks (as opposed to a single monolithic
table) based on a "partition key". Data Partitions are the corner
stone for effective and efficient data warehouses.
Vinod_MVP (Expert):
When working with partitioning you might need to query system tables
to get the partition details.
-- Some of the system tables to concentrate
SELECT * FROM sys.partitions
SELECT * FROM sys.tables
SELECT * FROM sys.indexes
SELECT * FROM sys.partition_functions
SELECT * FROM sys.partition_schemes
SELECT * FROM sys.partition_range_values
SELECT * FROM sys.partition_parameters
SriSamp [MVP] (Expert):
Here are some resources that can help you understand the concept of
partitions, data warehouses etc better:
Strategies for Partitioning Relational Data Warehouses in SQL
Server:
http://www.microsoft.com/technet/prodtechnol/sql/2005/spdw.mspx
Using Partitions in a Microsoft SQL Server 2000 Data Warehouse:
http://msdn.microsoft.com/library/en-us/dnsql2k/html/partitionsindw.asp?frame=true
SQL
Server 2000 Incremental Bulk Load Case Study:
http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/incbulkload.mspx
Partitioned Tables and Indexes in SQL Server 2005:
http://msdn.microsoft.com/library/en-us/dnsql90/html/sql2k5partition.asp?frame=true
Understanding LEFT and RIGHT in Partition Functions:
http://www.sqlskills.com/blogs/kimberly/PermaLink.aspx?guid=cf223632-c93b-4242-b0f2-af493e051266
Vinod_MVP (Expert):
The
last two links are very informative ...
Subhashini (Moderator):
Subhashini (Moderator): Hi guys , we are close to time up! Please
conclude your queries and feel free to email Vinod and Srini for any
additional queries
Vinod_MVP (Expert):
You
can also get some resources on SQL Server 2005 at:
My
Website:
www.ExtremeExperts.com
My
Blog:
http://blogs.sqlxml.org/vinodkumar/
SriSamp [MVP] (Expert):
Web
Site:
http://www32.brinkster.com/srisamp
Blog:
http://blogs.sqlxml.org/srinivassampath
Email: srisamp@gmail.com
Vinod_MVP (Expert):
Vicjk ...Although partitioning is transparent to the application,
partitioning adds administrative overhead and complexity to the
implementation of your objects.
Vinod_MVP (Expert):
While partitioning can offer great benefits, you may not want to
consider it when your table is small. And I know these are
subjective.
Vinod_MVP (Expert):
You
need to understand:
1.
Your performance needs
2.
Additional maintenance needs
3.
How is the data access like
4.
How many physical disks are available
See
that the administrative burden doesnt overshadow the performance
gains achieved.
Vinod_MVP (Expert):
There is no rule of thumb that we cannot at any point in time say
which number of rows are ideally suited for partitioning candidate.
SriSamp [MVP] (Expert):
The
SQL Server Books online has some parameters to look out for before
deciding on when you will need partitioning.
Vinod_MVP (Expert):
You
are a better judge as an Developer and/or DBA
Vinod_MVP (Expert):
Thanks all. Hope you enjoyed the chat. Do come to the local UG in
Bangalore for more SQL Server 2005 sessions.
http://groups.msn.com/BDotNet . We have "Introduction to Data
Mining in SQL Server 2005" this week.
SriSamp [MVP] (Expert):
Thanks guys. It was great sharing "one more" great feature of SQL
Server 2005 and yes, do walk-in into the UG meeting and you will
always learn something new...
Subhashini (Moderator):
Thanks to Vinod and Srini for such an informative session !
Subhashini (Moderator):
This brings us to the timeup! And please feel free to email your
queries to Vinod & Srini
Subhashini (Moderator):
Thanks to you guys for taking tiem out for this chat
Subhashini (Moderator):
Have a great evening ! :)
SriSamp [MVP] (Expert):
Thanks and bye all. |