Chat
Topic : CLR and XML Best practices Chat Expert : Srinivas [MVP]
September
14, 2005
Subhashini[MSFT]
(Moderator):
hello everybody :) a very good evening to all of you ! Subhashini[MSFT]
(Moderator):
Welcome to today's chat on CLR and XML best Practices Subhashini[MSFT]
(Moderator):
hosted by our two SQL Server MVPs Subhashini[MSFT]
(Moderator):
To give you a quick intro about our experts Subhashini[MSFT]
(Moderator):
Srinivas Sampath works for SCT Software Solutions(Bangalore). He graduated
with a bachelors in Computer Technology(PSG College of Technology)
and later pursued MCA (Madras University).He joined SCT (then called
Exeter Systems) in 1997 and has been with them for the past 6 years. Subhashini[MSFT] (Moderator):
His area of expertise include database architectures (using SQL Server)
and enterprise application architectures using Microsoft Technologies.
Srinivas currently is a Product Development Manager and also head
of Technology Labs, a group that focuses on latest Microsoft technologies
and their applicability into their product lines. Subhashini[MSFT]
(Moderator):
His interests include SQL Server 2000 programming, Office 2003 programming
and .NET. And you can reach him at ssampath@sct.co.in
<http://www32.brinkster.com/srisamp>
Subhashini[MSFT] (Moderator):
To introduce Vinod M Kumar, Subhashini[MSFT]
(Moderator):
Vinod Kumar did his Bachelors of Engineering in Mechanical from Anna
University, Chennai. After graduation, he joined SCT Software Solutions
(then Exeter Systems) where he worked for about 4 years. He has worked
for a team called Technology Labs which evaluates various Microsoft
technologies and products and their relevance in this line of business.
He is presently with Intel Technologies, Bangalore. He has been working
on Microsoft Technologies all along his professional career and is
very inclined towards learning every new concept that Microsoft comes
out with. In this process, he has gathered tons of knowledge to share
with all. Subhashini[MSFT] (Moderator):
Catch all the resources and articles Vinod has written at www.ExtremeExperts.com
<http://www.ExtremeExperts.com>,
the site he co-hosts. He has been very passionate about SQL Server
programming and loves to experiment different programming tricks in
SQL Server. Subhashini[MSFT]
(Moderator):
Vinod is an active volunteer in INETA and PASS. He has initiated a
SQL Server community in Bangalore. To capture more events on SQL Server,
join the group at <http://groups.msn.com/SQLBang>
which was formed as an initiative to bring PASS community in India. Subhashini[MSFT]
(Moderator):
and you can reach Vinod at Email ID: vinodk@mvps.org <mailto:vinodk@mvps.org>
Web Site: www.ExtremeExperts.com <http://www.ExtremeExperts.com> Subhashini[MSFT] (Moderator):
A few chat rules as usual before we start the chat .. Subhashini[MSFT]
(Moderator):
Please refrain from sending any private messages to the expert during
the chat Subhashini[MSFT] (Moderator):
Chat Procedures:
This chat will last for one hour. During this hour, our Experts will
respond to as many questions as they can. Please understand that there
may be some questions we cannot respond to due to lack of information
or because the information is not yet public. We encourage you to
submit questions for our Experts. Subhashini[MSFT]
(Moderator):
We ask that you stay on topic for the duration of the chat. This helps
the Guests and Experts follow the conversation more easily. We invite
you to ask off topic questions after this chat is over. Subhashini[MSFT]
(Moderator):
Please use the radial button "submit a question " to ask
any questions to the expert Subhashini[MSFT]
(Moderator):
So we are all set for a great chat session by our MVPs , we thank
you all for joining us today Subhashini[MSFT]
(Moderator):
and let's get the chat rolling ... Subhashini[MSFT]
(Moderator):
Hi Vinod and Srinivas Vinod_MVP (Expert):
Hi All ... Srinivas [MVP] (Expert):
Hi all. Good to get in touch with all of you once again. Srinivas
[MVP] (Expert):
So, let's get started. Srinivas [MVP]
(Expert):
As you all know, this chat is titled CLR and XML Best Practices for
SQL Server 2005. Srinivas [MVP] (Expert):
The integration of the CLR and the introduction of the XML as a first
class data type in SQL Server opens the doors for many new applications. Srinivas [MVP] (Expert):
In the past chats, we have seen each of these features in greater
detail and there are many online resources to help you get started. Srinivas [MVP] (Expert):
In today's chat we will explore some of the best practices that you
should be aware off, when using these new methods. Srinivas
[MVP] (Expert):
Note that the CLR and XML features are not being introduced here.
The assumption here is that you have either worked with them or have
read about them. Srinivas [MVP] (Expert):
First, we will start off with the CLR best practices. Vinod_MVP
(Expert):
Integration of CLR into SQL Server is one of the most demanding and
highly anticipated feature in the next generation of SQL Server 2005. Vinod_MVP (Expert):
Microsoft SQL Server 2005 significantly enhances the database programming
model by hosting the Microsoft .NET Framework 2.0 Common Language
Runtime (CLR). Vinod_MVP (Expert):
This enables developers to write procedures, triggers, and functions
in any of the CLR languages, particularly Microsoft Visual C# .NET,
Microsoft Visual Basic .NET, and Microsoft Visual C++. Vinod_MVP
(Expert):
In this chat we assume there is some understanding about this integration,
our prime focus would be to identify some of the scenario's where
we can capitalize this new feature. Vinod_MVP
(Expert):
The chat will try to provide a set of high-level guidelines on how
to choose between the available programming alternatives. Vinod_MVP
(Expert):
To start off with, Use CLR and XML judiciously to complement T-SQL. Vinod_MVP (Expert):
The TSQL enhacements are huge and for set based operation it is always
better to try use the TSQL way. Because it is more efficient. Vinod_MVP (Expert):
It is important to understand that CLR is NOT a black box. Moving
forward now the application developer has the option to write a code
in TSQL and CLR. Vinod_MVP (Expert):
But making the right choice for the operation can dictate the performance
of the application. Vinod_MVP (Expert):
If I were to compare the raw benefits that CLR brings to the table: Vinod_MVP (Expert):
a. T-SQL is interpreted and hence slower.
b. Error handling in T-SQL was frankly abysmal (although it has improved
in SQL Server 2005)
c. String handling in T-SQL is unoptimized and slow
d. SQL-CLR is compiled so faster
e. SQL-CLR lets you take advantage of the huge base class library
f. Capitalize existing .NET developers experience Vinod_MVP
(Expert):
But we will try to compare the CLR integration in various categories.
I have tried to talk over deciding over TSQL or CLR is also an important
decision. Vinod_MVP (Expert):
Here are some simple rules that you can use to decide the same. Vinod_MVP (Expert):
1. T-SQL is best suited for data access and has a simple programming
model to work with.
2. Managed code is all about procedural programming and computation.
3. Because it is compiled the CLR outperforms over computing-intensive
business logic functions.
4. If the procedure primarily involves forward-only, read-only row
navigation through a result set with some processing of each row,
using the CLR is likely more efficient. Vinod_MVP
(Expert):
So any questions till now over the CLR feature? Srinivas
[MVP] (Expert): Q: Why a clustered primary key
must exist in the table containing the indexed xml column ? A:
I will answer this one, even though it is not with respect to the
current CLR topic. The clustered index is required to correlate index
rows with the rows in the base table. Vinod_MVP
(Expert):
Can you post it as a question to the experts please? Srinivas
[MVP] (Expert):
All, please post questions to the expert and not as regular comments.
This will help us seggregate the questions and answer to them. Vinod_MVP (Expert): Q:
how GC works in environment ? A: The crus is this, now
CLR gets loaded just as another loader that exists for ASP.NET, or
Office VSTO application. But the only difference here is the SQL Server
takes complete ownership of the GC cycles, when it happens and when
an app domain needs to be created etc. Even the methods that needs
to be loaded is loaded ONLY in instruction from SQL Server. This process
is called as Fusion loading. Vinod_MVP
(Expert): Q: what is the difference between
1.1 clr and 2.0 clr A: The differences are the same what
the framework can provide. What in specific are you looking for from
an integration of SQL Server? Vinod_MVP
(Expert): Q: Safty measures for SQL in CLR?
A: the security aspect is quite detailed when it comes
to using CLR data type. It falls under three different category of
Safe, UnSafe and External. It follows the CAS policies and is bound
to the database. And I feel this aspect is critical. Moreover, SQL
Server CLR code CANNOT access memory areas outside its context boundaries,
else the App domain will be restarted. So we have a more granular
security boundary. Srinivas [MVP] (Expert): Q: Can I directly replace an application that uses SQL2K with
SQL 2005 to enjoy these features? A: Not sure what you
mean by replacing the application. If you just take a SQL 2K database
and put it onto SQL 2K5, it will run in 2K compatibility mode. After
this, you need to change the compatibility to 2K5 and then change
your application to start using the new features. Vinod_MVP
(Expert): Q: Hi vnoid..can we have unique clustered
index on table and create xml index ? A: Yes. This is very
much possible. the only restriction was an Primary clustured index
on a table. This is to get a locator of the pointer to the data row.
And this clustered index can be unique. And I generally prefer this
model too. Vinod_MVP (Expert):
So let us get rolling Vinod_MVP (Expert):
These are like rule of thumb but it is very vital to make some performance
numbers to the whole activity to get a hang of which outperforms the
other based on the scenario in hand. Vinod_MVP
(Expert):
I would like to re-iterate that CLR is not replacement for set-based
query processing. Here the query processor outperforms all. Vinod_MVP
(Expert):
So to summarize it from an usage perspective:
a. Use CLR for procedural and computation-intensive logic
b. Logic that will benefit from use of data structures not supported
in T-SQL (e.g. arrays)
c. Lastly, programs that take advantage of .NET Framework library
d. T-SQL better suited for procedures with mostly data access statements
with set based operations with DML and DDL operations. Vinod_MVP
(Expert):
CLR Vs XP's
So this is a prime candidate for usage of CLR. Building XP's in SQL
Server have ALWAYS been the domain of C++ geeks. So the benefits here
include:
1. Easier in-proc data access: No need to use loop-back connections
and binding sessions
2. Better administrative control: 3 permission buckets.
3. Reliable and scalable
I cannot really recollect an area where one needs to still use the
older model of XP's given the API's that .NET framework tries to expose. Vinod_MVP (Expert):
Let us look at some of the common scenarios of use: Vinod_MVP
(Expert):
1. Sending Results to the Client
This involves sending a set of rows to the client without "consuming"
them in the server (meaning no navigation of the rows inside the routine).
With T-SQL, simply embedding a SELECT statement in the T-SQL proc
has the effect of sending rows produced by the SELECT to the client.
With managed code, you use the SqlPipe object to send results to the
client. T-SQL and in-proc ADO.NET perform equally in this scenario.
Vinod_MVP (Expert):
2. Submitting SQL statements to the server
This involves the round-trip of executing an SQL statement from the
procedural code. In this case, T-SQL has a significant advantage (more
than twice as fast as in-proc ADO.NET). An important thing to note
here is that, the slower performance in CLR occurs because of the
additional layers of code that are involved in submitting a T-SQL
statement from managed code to native SQL code. After the statement
is submitted to the query processor, there is no difference in performance
based on the source of the statement (T-SQL or in-proc ADO.NET). Vinod_MVP (Expert):
3. Forward-only, Read-only row navigation
This involves navigating the rows produced by a SQL statement one
at a time in a forward only, read-only fashion. In T-SQL, this is
implemented through a forward only, read-only cursor. In CLR, it is
implemented through SqlDataReader. Typically, there is some processing
involved for each statement. If you ignore the processing associated
with each row, then the navigation of rows in CLR is slightly slower
than in T-SQL cursors. If you however factor in the processing that
is performed for every row, the advantage tilts more towards CLR since
the CLR significantly outperforms T-SQL on such processing. Vinod_MVP
(Expert):
4. Row-Navigation with updates
If the problem involves updating rows based on the current position
in the cursor, there is no relevant performance comparison because
this functionality is not supported through in-proc ADO.NET, and should
be done through T-SQL updateable cursors. Vinod_MVP
(Expert):
These are some interesting thoughts that I want you all to start thinking
on those lines. Now programming in SQL Server is a two part game and
making the right choice is critical. Vinod_MVP
(Expert):
Any questions over the CLR use inside SQL Server ? Go ahead and shoot
it ... Srinivas [MVP] (Expert): Q: So We can implement all the Business Logic on SQL server
side in CLR compatible language? A: No. Never do that.
The CLR features of SQL Server will have to be used only if you have
procedural code that is CPU intensive and also requires occasional
access to data. Also, other data validation not possible in T-SQL
can be done in the CLR. Use T-SQL as much as you can, since set-based
queries are always faster. Business logic must always be on the application
tier, since that is a layer than can easily scale and be changed without
affecting other tiers. Vinod_MVP (Expert): Q: So We can implement all the Business Logic on SQL server
side in CLR compatible language? A: Yes. Perfectly crafted
Srini. To add to this understand that it is easy for me to scale up
when it comes to SQL Server and difficult for me to scale out easily
which have always been the asset of the application tier. So use the
code that is bound to have computational intensive and some time chatty
with the data tier and have the application logic still reside at
the middle tier. Srinivas [MVP] (Expert):
Ok, now that we have seen the CLR best practices, we will move on
to the XML best practices. Srinivas [MVP]
(Expert):
By now, most of you would be knowing that one of the biggest improvements
in SQL Server 2005 was the introduction of the new XML data type. Srinivas [MVP] (Expert):
Just to provide a brief history of the same, SQL Server 2000 had very
limited support for XML. Srinivas [MVP]
(Expert):
You had to store XML documents as (N)TEXT columns and there was no
option to query the XML directly. Srinivas
[MVP] (Expert):
Simple APIs in the form of sp_xmlpreparedocument and OPENXML existed
to work on XML data and factor it into a relation model. Srinivas
[MVP] (Expert):
SQLXML was introduced as a middle-tier solution for submitting XML
queries to SQL Server and get the results back as XML through a concept
called XML views. Srinivas [MVP] (Expert):
You could also do some post processing on the XML by applying XSL
stylesheets. Bulk-load of XML was also supported through the SQLXML
OLEDB provider. Srinivas [MVP] (Expert):
If you are still using SQL Server 2000, and want an XML solution,
I would recommed that you look at SQLXML: http://www.microsoft.com/Downloads/details.aspx?displaylang=en&FamilyID=05F1A309-BE6E-49FE-968D-9833FA15CC12 Srinivas [MVP] (Expert):
SQL Server 2000 also allowed you to stream the results of a SELECT
clause to the client as XML using the FOR XML directive. Srinivas
[MVP] (Expert):
As the usage of XML increases and more applications adopt XML as the
de-facto way of expressing hierarchical data and are looking at ways
to store, retrieve and query the XML data, its time SQL Server did
something to help people. Srinivas [MVP]
(Expert):
SQL Server 2005 provides full support for XML as a data type. XML
documents can now be stored natively (either as documents or fragments)
in the database and can be used as variables and parameters. Srinivas
[MVP] (Expert):
You can also type the XML data with an XSD schema (or to a collection
of schemas). Srinivas [MVP] (Expert):
SQL Server 2005 also supports the querying of the data in the XML
using XQuery, the new W3C standard for querying with XML data. Srinivas [MVP] (Expert):
You can also modify the XML data in place (for example adding nodes)
by using the new XML DML facilities. Srinivas
[MVP] (Expert):
All of these enhancements for XML actually increase the surface area
of the solutions that you can now model using SQL Server 2005. Srinivas [MVP] (Expert):
Sure. Srinivas [MVP] (Expert):
Can you pose that as a question to the experts please? Vinod_MVP
(Expert): Q: Why a clustered primary key must
exist in the table containing the indexed xml column ? A:
In simple terms yes very much. But I would like to take a closer look
at the requirement to gauge the scenario. Consider if this is just
a some sort of data looping with a DML processing ... Then I think
the TSQL with the set based approach will hats off outperform. But
consider the situation where the looping of data is accompanied with
some complex matematical operation with very less data feed to the
processing, under such a circumstances the CLR will outperform. This
is the trade off between using compiled computational activity vs
performing interpreted data intensive task. Hope this demystifies
the scenarios of usage. Vinod_MVP (Expert): Q: Can we replace cursors with CLR code A: Oops...
That was for a different question. In simple terms yes very much.
But I would like to take a closer look at the requirement to gauge
the scenario. Consider if this is just a some sort of data looping
with a DML processing ... Then I think the TSQL with the set based
approach will hats off outperform. But consider the situation where
the looping of data is accompanied with some complex matematical operation
with very less data feed to the processing, under such a circumstances
the CLR will outperform. This is the trade off between using compiled
computational activity vs performing interpreted data intensive task.
Hope this demystifies the scenarios of usage. Srinivas
[MVP] (Expert):
Let me continue while the question is being asked. Srinivas
[MVP] (Expert):
But, what are use cases where you need to use XML? What are some of
the best practices? The focus of this chat is to tell you when and
how to use the XML data type. Srinivas
[MVP] (Expert):
Note that this chat does not introduce you to the XML features and
how to use it. It assumes some familiarity of having worked with (or
read about) the same. Srinivas [MVP]
(Expert):
The first question that people usually have is: Why should I store
XML documents in the database? Should you go the relational way or
the XML way? Let's address this... Srinivas
[MVP] (Expert):
If the data that you want to store is highly structured with a known
and fixed schema, you should always model it in a relation table. Srinivas [MVP] (Expert):
For example, if you know that the addresses of your customers have
Line 1, City, State and Zip and there can be more than 1 address record,
you can simply have a table of addresses. Srinivas
[MVP] (Expert):
Each address record can then be linked to the customer via a foreign
key. Srinivas [MVP] (Expert):
If, on the other hand, the structure of your data is not known (or
is flexible), then modelling it as a relational table may not be very
good. Srinivas [MVP] (Expert):
One example, that I can think of a structure for storing user preferences.
The information for user preferences is very flexible and it is not
useful to model it as a relational table. Srinivas
[MVP] (Expert):
In this case, you can store the preferences inside an XML structure
and perform queries over it. Srinivas
[MVP] (Expert):
Another thing that you have to ask yourself is, will you just need
to store the XML data or perform some operations on it in the database? Srinivas [MVP] (Expert):
If you need to just store it and get it back each time, you can always
use the (N)VARCHAR(MAX) data type. In this case, SQL Server just acts
as the data store for the XML. Srinivas
[MVP] (Expert): Q: What is the internal format,
the XML document is stored in SQL server ? A: It is a
binary format. Srinivas [MVP] (Expert):
This still does not answer the question: Why should I store XML in
the database? Srinivas [MVP] (Expert):
Here are some of the top reasons for doing this (adapted from the
article: XML Best Practices for SQL Server 2005 by Shankar Pal et
al) Srinivas [MVP] (Expert):
1. You want to use administrative functionality of the database server
for managing your XML data (for example backup) Srinivas
[MVP] (Expert):
2. You want to work with your XML data along with other relational
data in a transacted manner. Srinivas
[MVP] (Expert):
3. You need an efficient way of extracting portions of your XML data
and querying the same. Srinivas [MVP]
(Expert):
4. You want the server (SQL Server in this case) to validate your
XML and assure well-formedness. Srinivas
[MVP] (Expert):
5. You want indexing of XML data for efficient query processing and
good scalability and use a world-class query optimizer. Srinivas
[MVP] (Expert):
6. You want SOAP, ADO.NET and OLE DB accesses to the XML data. Srinivas [MVP] (Expert):
Once you have the decision that XML is going to be stored in the database,
the next thing that you must consider is how to optimize the access
to the XML. Srinivas [MVP] (Expert):
SQL Server 2005 allows you to "index" an XML column. XML
indexes are created using new DDL statements that can be applied on
typed and un-typed XML columns. Srinivas
[MVP] (Expert):
Indexing an XML column creates a B+ Tree for all the XML instances
in the column. Srinivas [MVP] (Expert):
XML Indexes are of two types: (1) Primary Index and (2) Secondary
Index (which depends on the primary index). Srinivas
[MVP] (Expert):
The Primary XML index requires a clustered index on the primary key
of the table where the XML is stored. Srinivas
[MVP] (Expert):
Once the index is created, a B-Tree of the various tags, elements,
node values and node types is created. Srinivas
[MVP] (Expert):
Apart from this, document order, structure and the path from the root
of the XML instance to each node is captured for efficient evaluation
of path expressions. Srinivas [MVP] (Expert):
Note that the primary key of the base table is duplicated in the primary
index to correlate index rows with the base table rows. Srinivas
[MVP] (Expert):
Queries involving XML data type methods use the primary XML index. Srinivas [MVP] (Expert): Q:
see meet, what srini is saying we can have primary kjey over the XML
datatype column... so i guess, once we have the PK on a column index
is automatically created... A: Let me clear some confusion
here. If you create a table having just an XML column, you cannot
index it. For example, if you are creating a table called Books which
store some information about the book and sample chapters as XML documents,
you need to have an primary key for that table if you need to index
your XML content. Otherwise, the optimizer will not know what XML
document to return from the list of rows that you have for a query
that you made. Thus, when you create the PK on the table, the XML
indexes in the internal storage structures of SQL Server are co-related
to the correct table rows. Vinod_MVP
(Expert): Q: We can store only XML files in
the discussed fashion or any type of files? If 'Only XML', then, will
SQL validate the XML file [well formed or not]? A: It is
a general understanding that XML is a special and first class data
type. So the discussion revolves around that. Having said, XML datatypes
are of two kinds. Typed and UnTyped. Typed is one with a schema attached
which enforces a hard binding over the formation of XML and for UnTyped
it is not bound to a schema and can even be XML fragments as your
question. Under UnTyped, SQL willnot validate the structure of contents.
If you want to have raw/XML fragments with no XML related query power
then I would recommend you use VARCHAR(MAX) datatype. BTW, you can
convert from VARCHAR to XML and vice-versa etc. This is the next bonus
... Subhashini[MSFT] (Moderator):
attention guys , we have the last 15 mins for the chat to conclude
. Please submit your questions to the experts Srinivas
[MVP] (Expert):
Let me continue. We are are running short of time. Srinivas
[MVP] (Expert):
Once the primary XML index has been created, secondary XML indexes
can be created to speed-up different classes of queries based on usage
patterns. Srinivas [MVP] (Expert):
Three types of secondary XML indexes can be created: PATH, PROPERTY
and VALUE. Srinivas [MVP] (Expert):
These indexes benefit path-based queries, custom property management
scnearios and value-based queries. Srinivas
[MVP] (Expert):
Here are some guidelines on when to create each of these types of
secondary indexes. Srinivas [MVP] (Expert):
1. If the workload uses path based expressions heavily on XML columns,
PATH based indexes can speed up the queries. Srinivas
[MVP] (Expert):
The most common case is the exist() method on XML columns in the WHERE
clause. Srinivas [MVP] (Expert):
2. If the workload retrieves multiple values from individual XML expressions
using path expressions, clustering paths within each XML instance
in the PROPERTY index is useful. Srinivas
[MVP] (Expert):
3. If the workload involves querying for values within XML instances
without knowing the element or attribute names that contains those
values, VALUE indexes are useful. Srinivas
[MVP] (Expert):
You can also create a full-text index on XML columns. Full text indexes,
index the contents of the XML values while ignoring the XML markup. Vinod_MVP (Expert): Q:
how index works on XML datatype columns....wht is key ? A:
Understand the key can be either a value, XPath or a property. The
concepts are very simple and it maps right back to the data with a
clustered index. And hence we have the restriction of a Clustered
index before we cna define a XML index. Srinivas
[MVP] (Expert):
Attribute values are not full-text indexed (since they are considered
part of the markup) and element tags are used as token boundaries. Srinivas [MVP] (Expert):
Full Text XML indexes can be combined with XML indexes in some cases
where we want to filter the XML values of interest based on full-text
searches. Srinivas [MVP] (Expert):
For a good treatment of XML indexes, refer to the following article
by Bob Beauchemin: http://msdn.microsoft.com/sql/default.aspx?pull=/library/en-us/dnsql90/html/xmlindexes.asp Srinivas [MVP] (Expert):
If you notice that queries on an XML column are made principally on
a few element values, you may want to think about the notion of "property
promotion". Srinivas [MVP] (Expert):
Property promotion is a concept wherein you pick out the frequently
accessed elements into a separate relational table and sync the table
through triggers. Srinivas [MVP] (Expert):
If the property is single-valued, for example the ISBN of a book,
you can create a computed column and store the same. Srinivas
[MVP] (Expert):
However, if the property is multi valued (for example, all the first
names of your customers), then a separate property table can be created. Srinivas [MVP] (Expert):
Usage of XML schemas to type XML columns can be very beneficial for
query processing and storage optimization. Srinivas
[MVP] (Expert):
Note that XML data not bound to a schema is considered as untyped
and the node values are stored as Unicode strings. Srinivas
[MVP] (Expert):
The only check that SQL Server does is for well-formedness. Srinivas
[MVP] (Expert):
Typing of XML is done by associating an XML data type with registered
XML schemas. Srinivas [MVP] (Expert):
An XML column, parameter or variable bound to an XML schema collection
is typed according to all the schemas in the collection. Srinivas
[MVP] (Expert):
Within an XML schema collection, the type system identifies each schema
using its target namespace. Srinivas
[MVP] (Expert):
Each top-level XML element in an instance must specify a possible
empty target namespace it conforms to. Srinivas
[MVP] (Expert):
Data is validated during insert and modification according to the
target namespace. Srinivas [MVP] (Expert):
Also, the binary XML representation encodes typed values based on
the associated XML schema information and is fully described, so that
reparsing is faster. Srinivas [MVP] (Expert):
During query compilation, XML schemas are used for type checking and
static errors are issued for type mismatches. Srinivas
[MVP] (Expert):
The query compiler also uses XML schemas for query optimizations. Srinivas [MVP] (Expert):
Here are some other best practices to consider: Srinivas
[MVP] (Expert):
1. When updating an XML instance in-place, take care that you insert
elements as the rightmost sibling or rightmost fragment. Srinivas
[MVP] (Expert):
We saw earlier that an XML document is shredded and stored in an internal
format that has tags, elements, node types and values etc. Srinivas
[MVP] (Expert):
Thus, when you insert a new node into an XML element, take care that
you insert that element as the last sibling. Srinivas
[MVP] (Expert):
For example, consider that you have a <book> node with several
<author> child nodes as siblings. Srinivas
[MVP] (Expert):
Now, if you need to add a new author to this set, it would be faster
if you add the new <author> element as the last element of the
set rather than the first one. Srinivas
[MVP] (Expert):
2. Always use exist() in the WHERE clause to check for the existence
of something. exist() uses indexes more efficiently (PATH and VALUE
indexes). Srinivas [MVP] (Expert):
3. Usage of schemas for typing XML allows the optimizer to allow range
scans, since the values are stored internally as per the schema. Srinivas [MVP] (Expert):
4. A new optimization has been recently added, wherein if you have
multiple value() predicates on the same SELECT, under certain conditions,
they are merged as one scan. Srinivas
[MVP] (Expert):
5. New profiler event (XQuery Typing) that allows you to check whether
static type checking is happening in the query. Srinivas
[MVP] (Expert):
6. The debugging techniques are the same for XML data types. By using
PerfMon, query plans, we can understand the operations used and then
re-write the queries efficiently. Srinivas
[MVP] (Expert):
7. When uploading large amounts of data into a table containing XML
columns, it is usually good to disable the XML indexes and rebuild
them again. Srinivas [MVP] (Expert):
8. XML column corruption is caught during DBCC checks and any anomalies
(for example non-conformance with schemas etc) is reported back. Srinivas [MVP] (Expert):
Apart from the "physical" checks that DBCC does, it now
also does "logical" checks for XML data. Srinivas
[MVP] (Expert):
These logical checks include checking XML data against the XML index
contents, validating the schema against the XML data etc. Srinivas
[MVP] (Expert):
Once an anomaly is reported, you can either opt to restore the database,
or decide to rebuild the index. Vinod_MVP
(Expert): Q: what if i have registered a schema
and later on there is a change... A: Go ahead with the alter schema and change the same. No issues.
Check this article: http://www.extremeexperts.com/SQL/Yukon/ChangingXMLSchema.aspx Srinivas [MVP] (Expert):
That brings me to the end of some of the most common best practices.
We can now answer a few questions before the chat ends. Srinivas
[MVP] (Expert):
I saw some questions on memory usage. Loading and parsing a large
XML document does have some performance improvements over SQL 2000,
but no data has been reported yet on the increase in the performance.
As I said before, you need to ask yourself why you want to load such
a huge document in the database. The application tier is better suited
for it. Srinivas [MVP] (Expert): Q: please list best practices for XML usage in YUKON ? (coding
Standards,If any) A: No specific coding standards exist.
You can use the list that I provided in this chat, like the usage
of indexes, schemas and the proper XML operators to ensure that your
XML access performs the way you want. Vinod_MVP
(Expert): Q: how easy or how diff is it to extract
data from the column defined as XML data type, if no indexes are defined... A: This is same as compared to extracting a data out of a table
with no indexes :). If you feel you will be making frequent calls
to a node, method or property then try considering the indexing option
for it makes the query faster. Or even think on the lines of property
promotion to enhance your queries. Srinivas
[MVP] (Expert): Q: if i have a column of type
XML... Is it compulsary to define a schema for that column? if i want
to store XML with different XSDs, how should i proceed? A:
Schemas are optional. Thus, if each row in your table has to store
a different XML document, one option is you can create a schema collection
of all the schemas that you want and then tie the XML to that collection.
The other option is to make the column as un-typed and do the schema
checking in the application tier. Vinod_MVP
(Expert): Q: i think, defining schema is not
compulsary... A: As said before, it is NOT mandatory. To
define a Typed XML datatype a schema is required. You can have untyped
XML's that DONOT require a schema attached to them and they can even
be XML fragments generated on the fly. Srinivas
[MVP] (Expert): Q: Is yukon uses DIFFGRAM internally
for XML handling in case of ADO ? A: Not sure I understand
the question. Will have to check. Srinivas
[MVP] (Expert):
Here are some resources that you can use to know more about the XML
features. Srinivas [MVP] (Expert):
1. My Blog about good XML articles: http://blogs.sqlxml.org/srinivassampath/archive/2005/07/05/3858.aspx Srinivas [MVP] (Expert):
2. Webcast: http://go.microsoft.com/fwlink/?LinkId=43856
(Developing Smart Client Applications Using SQL Server 2005 Native
XML Support) Srinivas [MVP] (Expert):
3. Webcast: http://go.microsoft.com/fwlink/?LinkId=43858
(Managing XML Data on the Database with SQL Server 2005 and Visual
Studio 2005) Srinivas [MVP] (Expert):
4. Webcast: http://go.microsoft.com/fwlink/?LinkId=43877
(Optimizing and Troubleshooting XML Applications at the Server) Vinod_MVP
(Expert): Q: Is yukon uses DIFFGRAM internally
for XML handling in case of ADO ? A: Can you be more specific
here. SQLXML did try to use Diffgrams but these were needed to improve
performance as data used to pass through the middle tier. But with
XML datatype the definition of XML is a column value rather than a
row valued indices. Hope you are able to make the distinction. Vinod_MVP (Expert):
Some more links of interest include: Vinod_MVP
(Expert):
Using CLR Integration in SQL Server 2005: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/sqlclrguidance.asp Vinod_MVP (Expert):
Making the Most of the CLR : http://www.windowsitpro.com/Article/ArticleID/45673/45673.html Vinod_MVP (Expert):
Using CLR features in SQL Server 2005 : http://www.sswug.org/see/22708 Vinod_MVP (Expert):
Security for CLR Assemblies in SQL Server 2005 : http://www.windowsitpro.com/SQLServer/Article/ArticleID/45948/SQLServer_45948.html Vinod_MVP (Expert):
SQL Server 2005-Managed Execution : http://www.extremeexperts.com/SQL/Yukon/ManagedExecution.aspx Vinod_MVP (Expert):
Can get to the CLR teams Blog at: http://blogs.msdn.com/sqlclr Vinod_MVP (Expert):
You can catch some resources over my website at: www.ExtremeExperts.com
too Subhashini[MSFT] (Moderator):
great , so that one fully loaded session ! Subhashini[MSFT]
(Moderator):
So that was one fully loaded session ! Srinivas
[MVP] (Expert): Q: How can we handle CDATA section
of XML, while reading from table ? A: Not sure what you
mean by handle. Can you give an example? Subhashini[MSFT]
(Moderator):
Thanks to all of you for joining us for this chat . Subhashini[MSFT]
(Moderator):
Thanks to Vinod and Srinivas for delivering such an informative session Subhashini[MSFT] (Moderator):
This brings us to the closure of today's chat Subhashini[MSFT]
(Moderator):
and please email your queries to Vinod and Srinivas Srinivas
[MVP] (Expert):
Our pleasure. In case you still have questions, do post them at the
SQLBang group at: http://groups.msn.com/sqlbang. Subhashini[MSFT]
(Moderator):
Srinivas :ssampath@sct.co.in <mailto:ssampath@sct.co.in> Srinivas [MVP] (Expert):
If you post in the group, more people can benefit from the answer. Subhashini[MSFT] (Moderator):
Vinod at mailto:vinodk@mvps.org> Vinod_MVP
(Expert):
Thanks all for your time. I hope you had some good information and
fun too. Catch you all at the Usergroups. Srinivas
[MVP] (Expert):
Thanks all. Have a nice evening (or day, wherever you are :-)) Vinod_MVP (Expert):
Bye all. Subhashini[MSFT] (Moderator):
check out the sql user groups in India at http://www.microsoft.com/india/communities/usergroups/default.aspx Subhashini[MSFT] (Moderator):
Please pool in your feedback about the chat at commind@microsoft.com Subhashini[MSFT] (Moderator):
Also feel free to email at commind@microsoft.com
if you would like to host a webchat ! :) Subhashini[MSFT]
(Moderator):
Thanks all of you for joining us and thanks to Srinivas and Vinod
for their time and expertise Subhashini[MSFT]
(Moderator):
have a wonderful evening :) Srinivas
[MVP] (Expert):
Bye all.