|
Codepage
Data Transformation for SQL Server 2000
CPDTS.dll - version
1.0
Contents:
Introduction
Download and Installation
Prerequisites and Constraints
How to use
Supported codepages
Would you like to migrate to Microsoft platform but your
data format and encoding is holding you from doing so? Or would you like to
keep your investments untouched but at the same time add some
modern services to your existing solutions?
CPDTS.dll is a data
transformation COM object that converts your data from different encoding. This
COM object will be used from within Microsoft SQL Server Enterprise Manager.
This means there are no additional steps to move data to or from SQL, and no
additional programs to convert data! With a special feature of handling Arabic
on mainframe and AS400 in IBM 420 codepage which is visual order and convert is
to logical order.
DTS is a set of graphical tools and programmable
objects that let you extract, transform, and consolidate data from different
sources into single or multiple destinations.
It is based on OLE DB architecture, so it supports a variety of data sources.
For example:
-
SQL Server and Oracle directly, using native OLE DB
providers.
-
ODBC sources, using the Microsoft OLE DB Provider
for ODBC.
-
Access 2000, Excel 2000, Microsoft Visual FoxPro®,
dBase, Paradox, HTML, and additional file data sources.
-
Text files, using the built-in DTS flat file OLE DB
provider.
-
Microsoft Exchange Server, Microsoft Active
Directory™ and other non-relational data sources.
-
Other data sources provided by third-party vendors.

Steps
for installation
Step 1: Copy
CPDTS.dll to your hard disk.
Create a new folder eg. DTSfolder and copy
CPDTS.dll to it.
Step 2:
Register the object.
From the task bar go the Start menu \ Run and type
regsvr32 <DTSfolder
>CPDTS.dll
That’s
it! Now you can use the codepage data transformation COM object from within
Microsoft SQL Server to convert your data!

-
The
number of source columns should equal the number of destination columns. This
is an expected behavior because this is a one-to-one mapping.
-
The source column type
should be “string”, the destination column type
should be either "string" or “Unicode string”. In the case of SQL you may use data types such as
nchar or nvarchar etc.
-
To
avoid data loss, we recommend the destination column size to be one and a half
times larger than the source column size, because one character in the source can match to two
characters, e.g.
لا.

How to
use
Step 1
From
Microsoft SQL Server enterprise manager, create a new data transformation
services package.

Step 2
Add the
data connections, both the source and the destination connections.
Be sure
to configure the DB2 source correctly, from
the source properties, set the advanced \PC code page option to ANSI –
Latin 1 [1252].Host CCSID to EBDIC – U.S./Canada[37] and check the
options “Process binary as character”. The following is an example of such a
configuration:

Step 3
Add a
transform data task and correctly specify the source and destination. Create a
new transformation based on the “Convert Arabic Strings”. As shown in the
following figure.

Select
the proper the columns mapping and you should be aware of the constrains,
mentioned earlier in the
Prerequisites and Constraints section.
Set the
source and destination codepages, from the transformation properties.
.

The Codepage DTS object supports the conversion from any
Windows supported code page to any Windows supported code page including
Unicode. The following are the main codepages:
Code pages
|
CharsetFriendlyName |
CodePage |
|
Arabic (ASMO 708) |
708 |
|
Arabic (DOS) |
720 |
|
Arabic (ISO) |
28596 |
|
Arabic (Mac) |
10004 |
|
Arabic (Windows) |
1256 |
|
Arabic (Nafitha) - Source CodePage ONLY |
711 |
|
Baltic (DOS) |
775 |
|
Baltic (ISO) |
28594 |
|
Baltic (Windows) |
1257 |
|
Central European (DOS) |
852 |
|
Central European (ISO) |
28592 |
|
Central European (Mac) |
10029 |
|
Central European (Windows) |
1250 |
|
Chinese Simplified (EUC) |
51936 |
|
Chinese Simplified (GB2312) |
936 |
|
Chinese Simplified (HZ) |
52936 |
|
Chinese Simplified (Mac) |
10008 |
|
Chinese Traditional (Big5) |
950 |
|
Chinese Traditional (CNS) |
20000 |
|
Chinese Traditional (Eten) |
20002 |
|
Chinese Traditional (Mac) |
10002 |
|
Cyrillic (DOS) |
866 |
|
Cyrillic (ISO) |
28595 |
|
Cyrillic (KOI8-R) |
20866 |
|
Cyrillic (KOI8-U) |
21866 |
|
Cyrillic (Mac) |
10007 |
|
Cyrillic (Windows) |
1251 |
|
German (IA5) |
20106 |
|
Greek (DOS) |
737 |
|
Greek (ISO) |
28597 |
|
Greek (Mac) |
10006 |
|
Greek (Windows) |
1253 |
|
Greek, Modern (DOS) |
869 |
|
Hebrew (DOS) |
862 |
|
Hebrew (ISO-Logical) |
38598 |
|
Hebrew (Mac) |
10005 |
|
Hebrew (Windows) |
1255 |
|
IBM EBCDIC (Arabic) |
420 |
|
IBM EBCDIC (Cyrillic Russian) |
20880 |
|
IBM EBCDIC (Cyrillic Serbian-Bulgarian) |
21025 |
|
IBM EBCDIC (Denmark-Norway) |
20277 |
|
IBM EBCDIC (Denmark-Norway-Euro) |
1142 |
|
IBM EBCDIC (Finland-Sweden) |
20278 |
|
IBM EBCDIC (Finland-Sweden-Euro) |
1143 |
|
IBM EBCDIC (Finland-Sweden-Euro) |
1143 |
|
IBM EBCDIC (France-Euro) |
1147 |
|
IBM EBCDIC (Germany) |
20273 |
|
IBM EBCDIC (Germany-Euro) |
1141 |
|
IBM EBCDIC (Greek Modern) |
875 |
|
IBM EBCDIC (Greek) |
20423 |
|
IBM EBCDIC (Hebrew) |
20424 |
|
IBM EBCDIC (Icelandic) |
20871 |
|
IBM EBCDIC (Icelandic-Euro) |
1149 |
|
IBM EBCDIC (International-Euro) |
1148 |
|
IBM EBCDIC (Italy) |
20280 |
|
IBM EBCDIC (Italy-Euro) |
1144 |
|
IBM EBCDIC (Japanese and Japanese Katakana) |
50930 |
|
IBM EBCDIC (Japanese and Japanese-Latin) |
50939 |
|
IBM EBCDIC (Japanese and US-Canada) |
50931 |
|
IBM EBCDIC (Japanese katakana) |
20290 |
|
IBM EBCDIC (Korean and Korean Extended) |
50933 |
|
IBM EBCDIC (Korean Extended) |
20833 |
|
IBM EBCDIC (Multilingual Latin-2) |
870 |
|
IBM EBCDIC (Simplified Chinese) |
50935 |
|
IBM EBCDIC (Spain) |
20284 |
|
IBM EBCDIC (Spain-Euro) |
1145 |
|
IBM EBCDIC (Thai) |
20838 |
|
IBM EBCDIC (Traditional Chinese) |
50937 |
|
IBM EBCDIC (Turkish Latin-5) |
1026 |
|
IBM EBCDIC (Turkish) |
20905 |
|
IBM EBCDIC (UK) |
20285 |
|
IBM EBCDIC (UK-Euro) |
1146 |
|
IBM EBCDIC (US-Canada) |
37 |
|
IBM EBCDIC (US-Canada-Euro) |
1140 |
|
Icelandic (DOS) |
861 |
|
Icelandic (Mac) |
10079 |
|
ISCII Assamese |
57006 |
|
ISCII Bengali |
57003 |
|
ISCII Devanagari |
57002 |
|
ISCII Gujarathi |
57010 |
|
ISCII Kannada |
57008 |
|
ISCII Malayalam |
57009 |
|
ISCII Oriya |
57007 |
|
ISCII Panjabi |
57011 |
|
ISCII Tamil |
57004 |
|
ISCII Telugu |
57005 |
|
Japanese (EUC) |
51932 |
|
Japanese (JIS) |
50220 |
|
Japanese (JIS-Allow 1 byte Kana - SO/SI) |
50222 |
|
Japanese (JIS-Allow 1 byte Kana) |
50221 |
|
Japanese (Mac) |
10001 |
|
Japanese (Shift-JIS) |
932 |
|
Korean |
949 |
|
Korean (EUC) |
51949 |
|
Korean (ISO) |
50225 |
|
Korean (Johab) |
1361 |
|
Korean (Mac) |
10003 |
|
Latin 3 (ISO) |
28593 |
|
Latin 9 (ISO) |
28605 |
|
Norwegian (IA5) |
20108 |
|
OEM United States |
437 |
|
Swedish (IA5) |
20107 |
|
Thai (Windows) |
874 |
|
Turkish (DOS) |
857 |
|
Turkish (ISO) |
28599 |
|
Turkish (Mac) |
10081 |
|
Turkish (Windows) |
1254 |
|
Unicode |
1200 |
|
US-ASCII |
20127 |
|
Vietnamese (Windows) |
1258 |
|
Western European (DOS) |
850 |
|
Western European (IA5) |
20105 |
|
Western European (ISO) |
28591 |
|
Western European (Mac) |
10000 |
|
Western European (Windows) |
1252 |

|