microsoft.com Home   All Products  |   Support  |   Search  |   microsoft.com Home  
Microsoft

Microsoft Typography | Developer | Uniscribe
Introduction | Uniscribe APIs


Uniscribe


Index

The Index is located at the bottom of this document.


Uniscribe build number

USPBUILD 0325


USP - Unicode Complex Script processor

Copyright (c) 1996-9, Microsoft Corporation. All rights reserved.


SCRIPT

The SCRIPT enum is an opaque type used internally to identify which shaping engine functions are used to process a given run.

#define SCRIPT_UNDEFINED 0


SCRIPT_UNDEFINED

This is the only public script ordinal. May be forced into the eScript field of a SCRIPT_ANALYSIS to disable shaping. SCRIPT_UNDEFINED is supported by all fonts - ScriptShape will display whatever glyph is defined in the font CMAP table, or, if none, the missing glyph.


USP Status Codes
#defineUSP_E_SCRIPT_NOT_IN_FONT\
MAKE_HRESULT(SEVERITY_ERROR,FACILITY_ITF,0x200) // Script doesn't exist in font


SCRIPT_CACHE

Many script APIs take a combination of HDC and SCRIPT_CACHE parameter.

A SCRIPT_CACHE is an opaque pointer to a Uniscribe font metric cache structure.

typedef void *SCRIPT_CACHE;

The client must allocate and retain one SCRIPT_CACHE variable for each character style used. It must be initialised by the client to NULL.

APIs are passed an HDC and the address of a SCRIPT_CACHE variable. Uniscribe will first attempt to access font data via the SCRIPT_CACHE and will only inspect the HDC if the required data is not already cached.

The HDC may be passed as NULL. If data required by Uniscribe is already cached, the HDC won't be accessed and operation continues normally.

If the HDC is passed as NULL, and Uniscribe needs to access it for any reason, Uniscribe will return E_PENDING.

E_PENDING is returned quickly, allowing the client to avoid time consuming SelectObject calls. The following example applies to all APIs that take a SCRIPT_CACHE and an optional HDC.

hr = ScriptShape(NULL, &sc, ..);
if (hr == E_PENDING) {
    ... select font into hdc ...
    hr = ScriptShape(hdc, &sc, ...);
}


ScriptFreeCache

The client may free a SCRIPT_CACHE at any time. Uniscribe maintains reference counts in it's font and shaper caches, and frees font data only when all sizes of the font are free, and shaper data only when all fonts it supports are freed.

The client should free the SCRIPT_CACHE for a style when it discards that style.

ScriptFreeCache always sets it's parameter to NULL to help avoid mis-referencing.

HRESULT WINAPI ScriptFreeCache(
SCRIPT_CACHE*psc);//InOutCache handle


SCRIPT_CONTROL

The SCRIPT_CONTROL structure provides itemization control flags to the ScriptItemize function.

typedef struct tag_SCRIPT_CONTROL {
DWORD uDefaultLanguage:16;// For NADS, also default for context
DWORD fContextDigits:1;// Means use previous script instead of uDefaultLanguage
// The following flags provide legacy support for GetCharacterPlacement features
DWORD fInvertPreBoundDir:1;// Reading order of virtual item immediately prior to string
DWORD fInvertPostBoundDir:1;// Reading order of virtual item immediately following string
DWORD fLinkStringBefore:1;// Equivalent to presence of ZWJ before string
DWORD fLinkStringAfter:1;// Equivalent to presence of ZWJ after string
DWORD fNeutralOverride:1;// Causes all neutrals to be strong in the current embedding direction
DWORD fNumericOverride:1;// Causes all numerals to be strong in the current embedding direction
DWORD fLegacyBidiClass:1;// Causes plus and minus to be reated as neutrals, slash as a common separator
DWORD fReserved:8;
} SCRIPT_CONTROL;

uDefaultLanguage

Language to use when Unicode values are ambiguous. Used by numeric processing to select digit shape when fDigitSubstitute (see SCRIPT_STATE) is in force.

fContextDigits

Specifies that national digits are chosen according to the nearest previous strong text, rather than using uDefaultLanguage.

fInvertPreBoundDir

By default text at the start of the string is laid out as if it follows strong text of the same direction as the base embedding level. Set fInvertPreBoundDir to change the initial context to the opposite of the base embedding level. This flag is for GetCharacterPlacement legacy support.

fInvertPostBoundDir

By default text at the end of the string is laid out as if it preceeds strong text of the same direction as the base embedding level. Set fInvertPostBoundDir to change the final context to the opposite of the base embedding level. This flag is for GetCharacterPlacement legacy support.

fLinkStringBefore

Causes the first character of the string to be shaped as if were joined to a previous character.

fLinkStringAfter

Causes the last character of the string to be shaped as if were joined to a following character.

fNeutralOverride

Causes all neutral characters in the string to be treated as if they were strong characters of their enclosing embedding level. This effectively locks neutrals in place, reordering occuring only between neutrals.

fNumericOverride

Causes all numeric characters in the string to be treated as if they were strong characters of their enclosing embedding level. This effectively locks numerics in place, reordering occuring only between numerics.

fReserved

Reserved. Always initialise to 0.


SCRIPT_STATE

The SCRIPT_STATE structure is used both to initialise the unicode algorithm state as an input parameter to ScriptItemize, and is also a component of each item analysis returned by ScriptItemize.

typedef struct tag_SCRIPT_STATE {
WORD uBidiLevel:5;// Unicode Bidi algorithm embedding level (0-16)
WORD fOverrideDirection:1;// Set when in LRO/RLO embedding
WORD fInhibitSymSwap:1;// Set by U+206A (ISS), cleared by U+206B (ASS)
WORD fCharShape:1;// Set by U+206D (AAFS), cleared by U+206C (IAFS)
WORD fDigitSubstitute:1;// Set by U+206E (NADS), cleared by U+206F (NODS)
WORD fInhibitLigate:1;// Equiv !GCP_Ligate, no Unicode control chars yet
WORD fDisplayZWG:1;// Equiv GCP_DisplayZWG, no Unicode control characters yet
WORD fArabicNumContext:1;// For EN->AN Unicode rule
WORD fGcpClusters:1;// For Generating Backward Compatible GCP Clusters (legacy Apps)
WORD fReserved:1;
WORD fEngineReserved:2;// For use by shaping engine
} SCRIPT_STATE;

uBidiLevel

The embedding level associated with all characters in this run according to the Unicode bidi algorithm. When passed to ScriptItemize, should be initialised to 0 for an LTR base embedding level, or 1 for RTL.

fOverrideDirection

TRUE if this level is an override level (LRO/RLO). In an override level, characters are layed out purely left to right, or purely right to left. No reordering of digits or strong characters of opposing direction takes place. Note that this initial value is reset by LRE, RLE, LRO or RLO codes in the string.

fInhibitSymSwap

TRUE if the shaping engine is to bypass mirroring of Unicode Mirrored glyphs such as brackets. Set by Unicode character ISS, cleared by ASS.

fCharShape

TRUE if character codes in the Arabic Presentation Forms areas of Unicode should be shaped. (Not implemented).

fDigitSubstitute

TRUE if character codes U+0030 through U+0039 (European digits) are to be substituted by national digits. Set by Unicode NADS, Cleared by NODS.

fInhibitLigate

TRUE if ligatures are not to be used in the shaping of Arabic or Hebrew characters.

fDisplayZWG

TRUE if control characters are to be shaped as representational glyphs. (Normally, control characters are shaped to the blank glyph and given a width of zero).

fArabicNumContext

TRUE indicates prior strong characters were Arabic for the purposes of rule P0 on page 3-19 of 'The Unicode Standard, version 2.0'. Should normally be set TRUE before itemizing an RTL paragraph in an Arabic language, FALSE otherwise.

fGcpClusters

For GetCharaterPlacement legacy support only. Initialise to TRUE to request ScriptShape to generate the LogClust array the same way as GetCharacterPlacement does in Arabic and Hebrew Windows95. Affects only Arabic and Hebrew items.

fReserved

Reserved. Always initialise to 0.

fEngineReserved

Reserved. Always initialise to 0.


SCRIPT_ANALYSIS

Each analysed item is described by a SCRIPT_ANALYSIS structure. It also includes a copy of the Unicode algorithm state (SCRIPT_STATE).

typedef struct tag_SCRIPT_ANALYSIS {
WORD eScript:10;// Shaping engine
WORD fRTL:1;// Rendering direction
WORD fLayoutRTL:1;// Set for GCP classes ARABIC/HEBREW and LOCALNUMBER
WORD fLinkBefore:1;// Implies there was a ZWJ before this item
WORD fLinkAfter:1;// Implies there is a ZWJ following this item.
WORD fLogicalOrder:1;// Set by client as input to ScriptShape/Place
WORD fNoGlyphIndex:1;// Generated by ScriptShape/Place - this item does not use glyph indices
SCRIPT_STATEs;
} SCRIPT_ANALYSIS;

eScript

Opaque value identifying which engine Uniscribe will use to Shape, Place and TextOut this item. The value of eScript is undefined, and will change in future releases, but attributes of eScript may be obtained by calling ScriptGetProperties.

fRTL

Rendering direction. Normally identical to the parity of the Unicode embedding level, but may differ if overridden by GetCharacterPlacement legacy support.

fLayoutRTL

Logical direction - whether conceptually part of a left-to-right sequenece or a right-to-left sequence. Although this is usually the same as fRTL, for a number in a right-to-left run, fRTL is False (because digits are always displayed LTR), but fLayoutRTL is True (because the number is read as part of the right-to-left sequence).

fLinkBefore

If set, the shaping engine will shape the first character of this item as if it were joining with a previous character. Set by ScriptItemize, may be overriden before calling ScriptShape.

fLinkAfter

If set, the shaping engine will shape the last character of this item as if it were joining with a subsequient character. Set by ScriptItemize, may be overriden before calling ScriptShape.

fLogicalOrder

If set, the shaping engine will generate all glyph related arrays in logical order. By default glyph related arrays are in visual order, the first array entry corresponding to the leftmost glyph. Set to FALSE by ScriptItemize, may be overriden before calling ScriptShape.

fNoGlyphIndex

May be set TRUE on input to ScriptShape to disable use of glyphs for this item. Additionally, ScriptShape will set it TRUE for hdcs containing symbolic, unrecognised and device fonts. Disabling glyphing disables complex script shaping. When set, shaping and placing for this item is implemented directly by calls to GetTextExtentExPoint and ExtTextOut.


SCRIPT_ITEM

The SCRIPT_ITEM structure includes a SCRIPT_ANALYSIS with the string ofset of the first character of the item.

typedef struct tag_SCRIPT_ITEM {
int iCharPos;// Logical offset to first character in this item
SCRIPT_ANALYSIS a;
} SCRIPT_ITEM;

iCharPos

Offset from beginning of itemised string to first character of this item, counted in Unicode codepoints (i.e. words).

a

Script analysis structure containing analysis specific to this item, to be passed to ScriptShape, ScriptPlace etc.


ScriptItemize - break text into items

Breaks a run of unicode into individually shapeable items. Items are delimited by

  • Change of shaping engine
  • Change of direction

The client may create multiple runs from each item returned by ScriptItemize, but should not combine multiple items into a single run.

Later the client will call ScriptShape for each run (when measuring or rendering), and must pass the SCRIPT_ANALYSIS that ScriptItemize returned.

HRESULT WINAPI ScriptItemize(
const WCHAR*pwcInChars,//InUnicode string to be itemized
intcInChars,//InCodepoint count to itemize
intcMaxItems,//InMax length of itemization array
const SCRIPT_CONTROL*psControl,//InAnalysis control (optional)
const SCRIPT_STATE*psState,//InInitial bidi algorithm state (optional)
SCRIPT_ITEM*pItems,//OutArray to receive itemization
int*pcItems);//OutCount of items processed (optional)


Returns E_INVALIDARG if pwcInChars == NULL or cInChars == 0 or pItems == NULL or cMaxItems < 2.

Returns E_OUTOFMEMORY if the output buffer length (cMaxItems) is insufficient. Note that in this case, as in all error cases, no items have been fully processed so no part of the output array contains defined values.

If psControl and psState are NULL on entry, ScriptItemize breaks the unicode string purely by character code. If they are all non-null, it performs a full Unicode bidi analysis.

ScriptItemize always adds a terminal item to the item analysis array (pItems) such that the length of an item at pItem is always available as:

pItem[1].iCharPos - pItem[0].iCharPos

For this reason, it is invalid to call ScriptItemize with a buffer of less than two SCRIPT_ANALYSIS items.

To perform a correct Unicode Bidi analysis, the SCRIPT_STATE should be initialised according to the paragraph reading order at paragraph start, and ScriptItemize should be passed the whole paragraph.

fRTL and fNumeric together provide the same classification as the lpClass output from GetCharacterPlacement.

European digits U+0030 through U+0039 may be rendered as national digits as follows:

fDigitSubstitute FContextDigits code U+0030 through U+0039
False Any Western (European / American) digits
True False As specified in SCRIPT_CONTROL.uDefaultLanguage
True True As prior strong text, defaulting to SCRIPT_CONTROL.uDefaultLanguage

For fContextDigits, any Western digits (U+0030 - U+0039) encountered before the first strongly directed character are substituted by the traditional digits of the SCRIPT_CONTROL.uDefaultLanguage when that language is written in the same direction as SCRIPT_STATE.uBidiLevel.

Thus, in a right-to-left string, if SCRIPT_CONTROL.uDefaultLanguage is 1 (LANG_ARABIC), then leading Western digits will be substituted by traditional Arabic digits.

However, also in a right-to-left string, if SCRIPT_CONTROL.uDefaultLanguage is 0x1e (LANG_THAI), then no substitution occurs on leading Western digits because the Thai language is written left-to-right.

Following strongly directed characters, digits are substituted by the traditional digits associated with the closest prior strongly directed character.

The left-to-right mark (LRM) and right-to-left mark (RLM) are strong characters whose language depends on the SCRIPT_CONTROL.uDefaultLangauge.

If SCRIPT_CONTROL.uDefaultLangauge is a left-to-right langauge, then LRM causes subsequent Western digits to be substituted by the traditional digits associated with that language, while Western digits following RLM are not substituted.

Conversly, if SCRIPT_CONTROL.uDefaultLangauge is a right-to-left langauge, then Western digits following LRM are not substituted, while Western digits following RLM are substituted by the traditional digits associated with that language.

Effect of Unicode control characters on SCRIPT_STATE:

SCRIPT_STATE flag Set by Cleared by
fDigitSubstitute NADS NODS
fInhibitSymSwap ISS ASS
fCharShape AAFS IAFS

SCRIPT_STATE.fArabicNumContext controls the Unicode EN->AN rule. It should normally be initialised to TRUE before itemizing an RTL paragraph in an Arabic language, FALSE otherwise.


ScriptLayout

The ScriptLayout function converts an array of run embedding levels to a map of visual to logical position, and/or logical to visual position.

pbLevel must contain the embedding levels for all runs on the line, ordered logically.

On output, piVisualToLogical[0] is the logical index of the run to display at the far left. Subsequent entries should be displayed progressing from left to right.

piLogicalToVisual[0] is the relative visual position where the first logical run should be displayed - the leftmost display position being zero.

The caller may request either piLogicalToVisual or piVisualToLogical or both.

Note: No other input is required since the embedding levels give all necessary information for layout.

HRESULT WINAPI ScriptLayout(
intcRuns,//InNumber of runs to process
const BYTE*pbLevel,//InArray of run embedding levels
int*piVisualToLogical,//OutList of run indices in visual order
int*piLogicalToVisual);//OutList of visual run positions


SCRIPT_JUSTIFY

The script justification enumeration provides the client with the glyph characteristic information it needs to implement justification.

typedef enum tag_SCRIPT_JUSTIFY {
SCRIPT_JUSTIFY_NONE=0,// Justification can't be applied at this glyph
SCRIPT_JUSTIFY_ARABIC_BLANK=1,// This glyph represents a blank in an Arabic run
SCRIPT_JUSTIFY_CHARACTER=2,// Inter-character justification point follows this glyph
SCRIPT_JUSTIFY_RESERVED1=3,// Reserved #1
SCRIPT_JUSTIFY_BLANK=4,// This glyph represents a blank outside an Arabic run
SCRIPT_JUSTIFY_RESERVED2=5,// Reserved #2
SCRIPT_JUSTIFY_RESERVED3=6,// Reserved #3
SCRIPT_JUSTIFY_ARABIC_NORMAL=7,// Normal Middle-Of-Word glyph that connects to the right (begin)
SCRIPT_JUSTIFY_ARABIC_KASHIDA=8,// Kashida(U+640) in middle of word
SCRIPT_JUSTIFY_ARABIC_ALEF=9,// Final form of Alef-like (U+627, U+625, U+623, U+632)
SCRIPT_JUSTIFY_ARABIC_HA=10,// Final form of Ha (U+647)
SCRIPT_JUSTIFY_ARABIC_RA=11,// Final form of Ra (U+631)
SCRIPT_JUSTIFY_ARABIC_BA=12,// Middle-Of-Word form of Ba (U+628)
SCRIPT_JUSTIFY_ARABIC_BARA=13,// Ligature of alike (U+628,U+631)
SCRIPT_JUSTIFY_ARABIC_SEEN=14,// Highest priority: Initial shape of Seen(U+633) (end)
SCRIPT_JUSTIFY_RESERVED4=15,// Reserved #4
} SCRIPT_JUSTIFY;


SCRIPT_VISATTR

The visual (glyph) attribute buffer generated by ScriptShape identifies clusters and justification points:

typedef struct tag_SCRIPT_VISATTR {
WORD uJustification:4;// Justification class
WORD fClusterStart:1;// First glyph of representation of cluster
WORD fDiacritic:1;// Diacritic
WORD fZeroWidth:1;// Blank, ZWJ, ZWNJ etc, with no width
WORD fReserved:1;// General reserved
WORD fShapeReserved:8;// Reserved for use by shaping engines
} SCRIPT_VISATTR;

uJustification

Justification class for this glyph. See SCRIPT_JUSTIFY.

fClusterStart

Set for the logically first glyph in every cluster, even for clusters containing just one glyph.

fDiacritic

Set for glyphs that combine with base characters.

fZeroWidth

Set by the shaping engine for some, but not all, zero width characters.


ScriptShape

The ScriptShape function takes a Unicode run and generates glyphs and visual attributes.

The number of glyphs generated varies according to the script and the font. Only for simple scripts and fonts does each Unicode code point generates a single glyph.

There is no limit on the number of glyphs generated by a codepoint. For example, a sophisticated complex script font might choose to constuct characters from components, and so generate many times as many glyphs as characters.

There are also special cases like invalid character representations, where extra glyphs are added to represent the invalid sequence.

A reasonable guess might be to provide a glyph buffer 1.5 times the length of the character buffer, plus a 16 glyph fixed addition for rare cases like invalid sequenece representation.

If ScriptShape returns E_OUTOFMEMORY it will be necessary to recall it, possibly more than once, until a large enough buffer is found.

HRESULT WINAPI ScriptShape(
HDChdc,//InOptional (see under caching)
SCRIPT_CACHE*psc,//InOutCache handle
const WCHAR*pwcChars,//InLogical unicode run
intcChars,//InLength of unicode run
intcMaxGlyphs,//InMax glyphs to generate
SCRIPT_ANALYSIS*psa,//InOutResult of ScriptItemize (may have fNoGlyphIndex set)
WORD*pwOutGlyphs,//OutOutput glyph buffer
WORD*pwLogClust,//OutLogical clusters
SCRIPT_VISATTR*psva,//OutVisual glyph attributes
int*pcGlyphs);//OutCount of glyphs generated


Returns E_OUTOFMEMORY if the output buffer length (cMaxGlyphs) is insufficient. Note that in this case, as in all error cases, the content of the output array is undefined.

Clusters are sequenced uniformly within the run, as are glyphs within the cluster - the fRTL item flag (from ScriptItemize) identifies whether left to right, or right to left.

ScriptShape may set the fNoGlyphIndex flag in psa if the font or OS cannot support glyph indices.

If fLogicalOrder is requested in psa, glyphs will be always be generated in the same order as the original Unicode characters.

If fLogicalOrder is not set, right to left items are generated in reverse order, so ScriptTextOut does not need to reverse them before calling ExtTextOut.


ScriptPlace

The ScriptPlace function takes the output of a ScriptShape call and generates glyph advance width and 2D offset information.

The composite ABC width for the whole item identifies how much the glyphs overhang to the left of the start position and to the right of the length implied by the sum of the advance widths.

The total advance width of the line is exactly abcA + abcB + abcC.

abcA and abcC are maintained internally by Uniscribe as proportions of the cell height represented in 8 bits and are thus roughly +/- 1%. The total width returned (as the sum of piAdvance, and as the sum of abcA+abcB+abcC) is accurate to the resolution of the TrueType shaping engine.

All glyph related arrays are in visual order unless the fLogicalOrder flag is set in psa.

#ifndef LSDEFS_DEFINED
typedef struct tagGOFFSET {
LONG du;
LONG dv;
} GOFFSET;
#endif
HRESULT WINAPI ScriptPlace(
HDChdc,//InOptional (see under caching)
SCRIPT_CACHE*psc,//InOutCache handle
const WORD*pwGlyphs,//InGlyph buffer from prior ScriptShape call
intcGlyphs,//InNumber of glyphs
const SCRIPT_VISATTR*psva,//InVisual glyph attributes
SCRIPT_ANALYSIS*psa,//InOutResult of ScriptItemize (may have fNoGlyphIndex set)
int*piAdvance,//OutAdvance wdiths
GOFFSET*pGoffset,//Outx,y offset for combining glyph
ABC*pABC);//OutComposite ABC for the whole run (Optional)


ScriptTextOut

The ScriptTextOut function takes the output of both ScriptShape and ScriptPlace calls and calls the operating system ExtTextOut function appropriately.

All arrays are in visual order unless the fLogicalOrder flag is set in psa.

HRESULT WINAPI ScriptTextOut(
const HDChdc,//InOS handle to device context (required)
SCRIPT_CACHE*psc,//InOutCache handle
intx,//Inx,y position for first glyph
inty,//In
UINTfuOptions,//InExtTextOut options
const RECT*lprc,//Inoptional clipping/opaquing rectangle
const SCRIPT_ANALYSIS*psa,//InResult of ScriptItemize
const WCHAR*pwcReserved,//InReserved (requires NULL)
intiReserved,//InReserved (requires 0)
const WORD*pwGlyphs,//InGlyph buffer from prior ScriptShape call
intcGlyphs,//InNumber of glyphs
const int*piAdvance,//InAdvance widths from ScriptPlace
const int*piJustify,//InJustified advance widths (optional)
const GOFFSET*pGoffset);//Inx,y offset for combining glyph


The caller should normally use SetTextAlign(hdc, TA_RIGHT) before calling ScriptTextOut with an RTL item inlogical order.

The piJustify array provides requested cell widths for each glyph. When the piJustify width of a glyph differs from the unjustified width (in PiAdvance), space is added to or removed from the glyph cell at it's trailing edge. The glyph is always aligned with the leading edge of it's cell. (This rule applies even in visual order.)

When a glyph cell is extended the extra space is uaually made up by the addition of white space, however for Arabic scripts, the extra space is made up by one or more kashida glyphs, unless the extra space is insufficient for the shortest kashida glyph in the font. (The width of the shortest kashida is available by calling ScriptGetFontProperties.)

piJustify should only be passed if re-justification of the string is required. Normally pass NULL to this parameter.

fuOptions may contain ETO_CLIPPED or ETO_OPAQUE (or neither or both).

Do not use ScriptTextOut to write to a metafile unless you are sure that the metafile will eventually be played back without any font substitution. ScriptTextOut record glyph numbers in the metafile. Since glyph numbers vary considerably from one font to another such a metafile is unlikely to play back correctly when differant fonts are substituted.

For example when a metafile is played back at a different scale CreateFont requests recorded in the metafile may resolve to bitmap instead of truetype fonts, or if the metafile is played back on a different machine requested fonts may not be installed.//

To write complex scripts in a metafile in a font independant manner, use ExtTextOut to write the logical characters directly, so that glyph generation and placement does not occur until the text is played back.


ScriptJustify

ScriptJustify provides a simple minded implementation of multilingual justification.

Sophisticated text formatters may prefer to generate their own delta dx array by combining their own features with the information returned by ScriptShape in the SCRIPT_VISATTR array.

ScriptJustify establishes how much adjustment to make at each glyph position on the line. It interprets the SCRIPT_VISATTR array generated by a call to ScriptShape, and gives top priority to kashida, then uses inter word spacing if there's no kashida points, then uses intercharacter spacing if there are no inter-word points.

The justified advance widths generated in ScriptJustify should be passed to ScriptTextOut in the piJustify paramter.

ScriptJustify creates a justify array containing updated advance widths for each glyph. Where a glyphs advance width is increased, it is expected that the extra width will be rendered to the right of the glyph, with as white space or, for Arabic text, as kashida.


HRESULT WINAPI ScriptJustify(
const SCRIPT_VISATTR*psva,//InCollected visual attributes for entire line
const int*piAdvance,//InAdvance widths from ScriptPlace
intcGlyphs,//InSize of all arrays
intiDx,//InDesired width change, either increase or descrease
intiMinKashida,//InMinimum length of continuous kashida glyph to generate
int*piJustify);//OutUpdated advance widths to pass to ScriptTextOut


SCRIPT_LOGATTR

The SCRIPT_LOGATTR structure describes attributes of logical characters useful when editing and formatting text.

Note that for wordbreaking and linebreaking, if the first character of the run passed in is not whitespace, the client needs to check whether the last character of the previous run is whitespace to determine if the first character of this run is the start of a word.

typedef struct tag_SCRIPT_LOGATTR {
BYTE fSoftBreak:1;// Potential linebreak point
BYTE fWhiteSpace:1;// A unicode whitespace character, except NBSP, ZWNBSP
BYTE fCharStop:1;// Valid cursor position (for left/right arrow)
BYTE fWordStop:1;// Valid cursor position (for ctrl + left/right arrow)
BYTE fInvalid:1;// Invalid character sequence
BYTE fReserved:3;
} SCRIPT_LOGATTR;

fSoftBreak

It would be valid to break the line in front of this character. This flag is set on the first character of South-East Asian words. Note that when linebreaking the client would usually also treat any nonblank following a blank as a softbreak position, by inspecting the fWhiteSPace flag below.

fWhiteSpace

This character is one of the many Unicode character that are classified as breakable whitespace.

fCharStop

Valid cursor position. Set on most characters, but not on codepoints inside Indian and South East Asian character clusters. May be used to implement left and right arrow operation in editors.

fWordStop

Valid position following word advance/retire commonly implemented at ctrl/left-arrow and ctrl/right-arrow. May be used to implement ctrl+left and ctrl+right arrow operation in editors. As with fSoftBreak clients should normally also inspect the fWhiteSpace flag and treat the first character after a run of whitespace as the start of a word.

fInvalid

Marks characters which form an invalid or undisplayable combination. Scripts which can set this flag have the flag fInvalidLogAttr set in their SCRIPT_PROPERTIES.


ScriptBreak

The ScriptBreak function returns cursor movement and formatting break positions for an item as an array of SCRIPT_LOGATTRs. To support mixed formatting within a single word correctly, ScriptBreak should be passed whole items as returned by ScriptItemize.

ScriptBreak does not require an hdc and does not execute glyph shaping.

The fCharStop flag marks cluster boundaries for those scripts where it is conventional to restrict from moving inside clusters. The same boundaries could also be inferred by inspecting the pLogCLust array returned by ScriptShape, however ScriptBreak is considerably faster in implementation and does not require an hdc to be prepared.

The fWordStop, fSoftBreak and fWhiteSpace flags are only available through ScriptBreak.

Most shaping engines that identify invalid sequences do so by setting the fInvalid flag in ScriptBreak. The fInvalidLogAttr flag in ScriptProperties identifies which scripts do this.

HRESULT WINAPI ScriptBreak(
const WCHAR*pwcChars,//InLogical unicode item
intcChars,//InLength of unicode item
const SCRIPT_ANALYSIS*psa,//InResult of earlier ScriptItemize call
SCRIPT_LOGATTR*psla);//OutLogical character attributes


ScriptCPtoX

The ScriptCPtoX function returns the x offset from the left end (!fLogical) or leading edge (fLogical) of a run to either the leading or the trailing edge of a logical character cluster.

iCP is the offset of any logical character in the cluster.

For scripts where the caret may conventionally be placed into the middle of clusters (e.g. Arabic, Hebrew), the returned X may be an interpolated position for any codepoint in the line.

For scripts where the caret is conventionally snapped to the boundaries of clusters, (e.g. Thai, Indian), the resulting X position will be snapped to the requested edge of the cluster containing CP.

HRESULT WINAPI ScriptCPtoX(
intiCP,//InLogical character position in run
BOOLfTrailing,//InWhich edge (default - leading)
intcChars,//InCount of logical codepoints in run
intcGlyphs,//InCount of glyphs in run
const WORD*pwLogClust,//InLogical clusters
const SCRIPT_VISATTR*psva,//InVisual glyph attributes array
const int*piAdvance,//InAdvance widths
const SCRIPT_ANALYSIS*psa,//InScript analysis from item attributes
int*piX);//OutResulting X position


ScriptXtoCP

The ScriptXtoCP function converts an x offset from the left end (!fLogical) or leading edge (fLogical) of a run to a logical character position and a flag that indicates whether the X position fell in the leading or the trailing half of the character.

For scripts where the cursor may conventionally be placed into the middle of clusters (e.g. Arabic, Hebrew), the returned CP may be for any codepoint in the line, and fTrailing will be either zero or one.

For scripts where the cursor is conventionally snapped to the boundaries of a cluster, the returned CP is always the position of the logically first codepoint in a cluster, and fTrailing is either zero, or the number of codepoints in the cluster.

Thus the appropriate cursor position for a mouse hit is always the returned CP plus the value of fTrailing.

If the X positition passed is not in the item at all, the resulting position will be the trailing edge of character -1 (for X positions before the item), or the leading edge of character 'cChars' (for X positions following the item).

HRESULT WINAPI ScriptXtoCP(
intiX,//InX offset from left of run
intcChars,//InCount of logical codepoints in run
intcGlyphs,//InCount of glyphs in run
const WORD*pwLogClust,//InLogical clusters
const SCRIPT_VISATTR*psva,//InVisual glyph attributes
const int*piAdvance,//InAdvance widths
const SCRIPT_ANALYSIS*psa,//InScript analysis from item attributes
int*piCP,//OutResulting character position
int*piTrailing);//OutLeading or trailing half flag


Relationship between caret positions, justifications points and clusters

Job Uniscribe support
Caret move by character cluster LogClust or VISATTR.fClusterStart or LOGATTR.fCharStop
Line breaking between characters LogClust or VISATTR.fClusterStart or LOGATTR.fCharStop
Caret move by word LOGATTR.fWordStop
Line breaking between words LOGATTR.fWordStop
Justification VISATTR.uJustification


Character clusters

Character clusters are glyph sequences that cannot be split between lines.

Some languages (e.g. Thai, Indic) restrict caret placement to points betwen clusters. This applies both to keyboard initiated caret movement (e.g. cursor keys) and pointing and clicking with the mouse (hit testing).

Uniscribe provides cluster information in both the visual and logical attributes. If you've called ScriptShape you'll find the cluster information represented both by sequences of the same value in the pwLogClust array, and by the fClusterStart flag in the psva SCRIPT_VISATTR array.

ScriptBreak also returns the fCharStop flag in the SCRIPT_LOGATTR array to identify cluster positions.


Word break points

Valid positions for moving the caret when moving in whole words are marked by the fWordStop flag returned by ScriptBreak.

Valid positions for breaking lines between words are marked by the fSoftBreak flag returned by ScriptBreak.


Justification

Justification space or kashida should be inserted where identified by the uJustificaion field of the SCRIPT_VISATTR.

When performing inter-character justification, insert extra space only after glyphs marked with uJustify == SCRIPT_JUSTIFY_CHARACTER.


Script specific processing

Uniscribe provides information about special processing for each script in the SCRIPT_PROPERTIES array.

Use the following code during initialisation to get a pointer to the SCRIPT_PROPERTIES array:

const SCRIPT_PROPERTIES **g_ppScriptProperties; 
	// Array of pointers to properties
int iMaxScript;
HRESULT hr;
hr = ScriptGetProperties(&g_ppScriptProperties,
	&g_iMaxScript);

Then inspect the properties of the script of an item 'iItem' as follows:

hr = ScriptItemize( ... , pItems, ... );
...
if (g_ppScriptProperties[pItems[iItem].a.eScript]
	->fNeedsCaretInfo)
	{ // Use ScriptBreak to restrict the 
	caret from entering clusters (for example). }

SCRIPT_PROPERTIES.fNeedsCaretInfo

Caret placement should be restricted to cluster edges for scripts such as Thai and Indian. The fNeedsCaretInfo flag in SCRIPT_PROPERTIES identifies such languages.

Note that ScriptXtoCP and ScriptCPtoX automatically apply caret placement restictions.

SCRIPT_PROPERTIES.fNeedsWordBreaking

For most scripts, word break placement may be identified by scanning for characters marked as fWhiteSpace in SCRIPT_LOGATTR, or for glyphs marked as uJustify == SCRIPT_JUSTIFY_BLANK or SCRIPT_JUSTIFY_ARABIC_BLANK in SCRIPT_VISATTR.

For languages such as Thai, it is also necessary to call ScriptBreak, and include character positions marked as fWordStop in SCRIPT_LOGATTR. Such scripts are marked as fNeedsWordbreaking in SCRIPT_PROPERTIES.

SCRIPT_PROPERTIES.fNeedsCharacterJustify

Languages such as Thai also require inter-character spacing when justifying (where uJustify == SCRIPT_JUSTIFY_CHARACTER in the SCRIPT_VISATTR). Such languages are marked as fNeedsCharacterJustify in SCRIPT_PROPERTIES.

SCRIPT_PROPERTIES.fAmbiguousCharSet

Many Uniscribe scripts do not correspond directly to 8 bit character sets. For example Unicode characters in the range U+100 through U+024F represent extended latin shapes used for many languages, including those supported by EASTEUROPE_CHARSET, TURKISH_CHARSET and VIETNAMESE_CHARSET. However many of these characters are supported by more han one of thsese charsets. fAmbiguousCharset is set for any script token which could contain characters from a number of these charsets. In these cases the bCharSet field may contain ANSI_CHARSET or DEFAULT_CHARSET. The Uniscribe client will generally need to apply futher processing to determine which charset to use when requesting a font suitable for this run. For example it determine that the run consists of multiple languages and split it up to use a different font for each language.


Notes on ScriptXtoCP and ScriptCPtoX

Both functions work only within runs and require the results of a previous ScriptShape call.

The client must establish which run a given cursor offset or x position is within before passing it to ScriptCPtoX or ScriptXtoCP.

Cluster information in the logical cluster array is used to share the width of a cluster of glyphs equally among the logical characters they represent.

For example, the lam alif glyph is divided into four areas: the leading half of the lam, the trailing half of the lam, the leading half of the alif and the trailing half of the alif.

ScriptXtoCP Understands the caret position conventions of each script. For Indian and Thai, caret positions are snapped to cluster boundaries, for Arabic and Hebrew, caret positions are interpolated within clusters.


Translating mouse hit 'x' offset to caret position

Conventionally, caret position 'cp' may be selected by clicking either on the trailing half of character 'cp-1' or on the leading half of character 'cp'. This may easily be implemented as follows:

int iCharPos;
int iCaretPos
int fTrailing;
ScriptXtoCP(iMouseX, ..., &iCharPos, &fTrailing);
iCaretPos = iCharPos + fTrailing;

For scripts that snap the caret to cluster boundaries, ScriptXtoCP returns ftrailing set to either 0, or the width of the cluster in codepoints. Thus the above code correctly returns only valid caret positions.


Displaying the caret in bidi strings

In unidirectional text, the leading edge of a character is at the same place as the trailing edge of the previous character, so there is no ambiguity in placing the caret between characters.

In bidirectional text, the caret position between runs of opposing direction may be ambiguous.

For example in the left to right paragraph 'helloMAALAS', the last letter of 'hello' immediately preceeds the first letter of 'salaam'. The best position to display the caret depends on whether it is considered to follow the 'o' of 'hello', or to preceed the 's' of 'salaam'.


Commonly used caret positioning conventions

Situation Visual caret placement
Typing Trailing edge of last character typed
Pasting Trailing edge of last character pasted
Caret advancing Trailing edge of last character passed over
Caret retiring Leading edge of last character passed over
Home Leading edge of line
End Trailing edge of line

The caret may be positioned as follows:

if (advancing) {
    ScriptCPtoX(iCharPos-1, TRUE, ..., &iCaretX);
} else {
    ScriptCPtoX(iCharPos, FALSE, ..., &iCaretX);
}

Or, more simply, given an fAdvancing BOOL restricted to TRUE or FALSE:

ScriptCPtoX(iCharPos-fAdvancing, fAdvancing, ..., &iCaretX);

ScriptCPtoX handles out of range positions logically: it returns the leading edge of the run for iCharPos <0, and the trailing edge of the run for iCharPos >=length.


ScriptGetLogicalWidths

Converts visual withs in piAdvance into logical widths, one per original character, in logical order.

Ligature glyphs widths are divided evenly amongst the characters they represent.

HRESULT WINAPI ScriptGetLogicalWidths(
const SCRIPT_ANALYSIS*psa,//InScript analysis from item attributes
intcChars,//InCount of logical codepoints in run
intcGlyphs,//InCount of glyphs in run
const int*piGlyphWidth,//InAdvance widths
const WORD*pwLogClust,//InLogical clusters
const SCRIPT_VISATTR*psva,//InVisual glyph attributes
int*piDx);//OutLogical widths


ScriptGetLogicalWidths is useful for recording widths in a font independant manner. By passing the recorded logical widths to ScriptApplyLogicalWidths, a block of text can be replayed in the same boundaries with acceptable loss of quality even when the original font is not available.


ScriptApplyLogicalWidth

Accepts an array of advance widths in logical order, corresponding one to one with codepoints, and generates an array of glyph widths suitable for passing to the piJustify parameter of ScriptTextOut.

ScriptApplyLogicalWidth may be used to reapply logical widths obtained with ScriptGetLogicalWidths. It may be useful in situations such as metafiling, where it is necessary to record and reapply advance width information in a font independant manner.

HRESULT WINAPI ScriptApplyLogicalWidth(
const int*piDx,//InLogical dx array to apply
intcChars,//InCount of logical codepoints in run
intcGlyphs,//InGlyph count
const WORD*pwLogClust,//InLogical clusters
const SCRIPT_VISATTR*psva,//InVisual attributes from ScriptShape/Place
const int*piAdvance,//InGlyph advance widths from ScriptPlace
const SCRIPT_ANALYSIS*psa,//InScript analysis from item attributes
ABC*pABC,//InOutUpdated item ABC width (optional)
int*piJustify);//OutResulting glyph advance widths for ScriptTextOut


piDx

Pointer to an array of dx widths in logical order, one per codepoint.

cChars

Count of the logical codepoints in the run.

cGlyphs

Glyph count.

pwLogClust

Pointer to an array of logical clusters from ScriptShape

psva

Pointer to an array of visual attributes from ScriptShape and updated by ScriptPlace.

piAdvance

Pointer to an array of glyph advance widths from ScriptPlace.

psa

Pointer to a SCRIPT_ANALYSIS structure from ScriptItemize and updated by ScriptShape and SriptPlace..

pABC

Pointer to the run overall ABC width (optional). If present, when the function is called, it should contain the run ABC width returned by ScriptPlace; when the function returns, the ABC width has been updated to match the new widths.

piJustify

Pointer to an array of the resulting glyph advance widths. This is suitable for passing to the piJustify parameter of ScriptTextOut.


ScriptGetCMap

ScriptGetCMap may be used to determine which characters in a run are supported by the selected font.

It returns glyph indices of Unicode characters according to Truetype Cmap table, or standard Cmap implemented for old style fonts. The glyph indices are returned in the same order as the input string.

The caller may scan the returned glyph buffer looking for the default glyph to determine which characters are not available. (The default glyph index for the selected font should be determined by calling ScriptGetFontProperties).

The return value indicates the presence of any missing glyphs.

#defineSGCM_RTL0x00000001// Return mirrored glyph for mirrorable Unicode codepoints
HRESULT WINAPI ScriptGetCMap(
HDChdc,//InOptional (see notes on caching)
SCRIPT_CACHE*psc,//InOutAddress of Cache handle
const WCHAR*pwcInChars,//InUnicode codepoint(s) to look up
intcChars,//InNumber of characters
DWORDdwFlags,//InFlags such as SGCM_RTL
WORD*pwOutGlyphs);//OutArray of glyphs, one per input character


returns S_OK - All unicode codepoints were present in the font S_FALSE - Some of the Unicode codepoints were mapped to the default glyph E_HANDLE - font or system does not support glyph indices


ScriptGetGlyphABCWidth

Returns ABC width of a given glyph. May be useful for drawing glyph charts. Should not be used for run of the mill complex script text formatting.

HRESULT WINAPI ScriptGetGlyphABCWidth(
HDChdc,//InOptional (see notes on caching)
SCRIPT_CACHE*psc,//InOutAddress of Cache handle
WORDwGlyph,//InGlyph
ABC*pABC);//OutABC width


returns S_OK - Glyph width returned E_HANDLE - font or system does not support glyph indices


SCRIPT_PROPERTIES

typedef struct {
DWORD langid:16;// Primary and sublanguage associated with script
DWORD fNumeric:1;
DWORD fComplex:1;// Script requires special shaping or layout
DWORD fNeedsWordBreaking:1;// Requires ScriptBreak for word breaking information
DWORD fNeedsCaretInfo:1;// Requires caret restriction to cluster boundaries
DWORD bCharSet:8;// Charset to use when creating font
DWORD fControl:1;// Contains only control characters
DWORD fPrivateUseArea:1;// This item is from the Unicode range U+E000 through U+F8FF
DWORD fNeedsCharacterJustify:1;// Requires inter-character justification
DWORD fInvalidGlyph:1;// Invalid combinations generate glyph wgInvalid in the glyph buffer
DWORD fInvalidLogAttr:1;// Invalid combinations are marked by fInvalid in the logical attributes
DWORD fCDM:1;// Contains Combining Diacritical Marks
DWORD fAmbiguousCharSet:1;// Script does not correspond 1:1 with a charset
DWORD fClusterSizeVaries:1;// Measured cluster width depends on adjacent clusters
DWORD fRejectInvalid:1;// Invalid combinations should be rejected
} SCRIPT_PROPERTIES;

langid

Language associated with this script. When a script is used for many languages, langid id represents a default language. For example, Western script is represented by LANG_ENGLISH although it is also used for French, German, Spanish etc.

fNumeric

Script contains numerics and characters used in conjunction with numerics by the rules of the Unicode bidirectional algorithm. For example dollar sign and period are classified as numeric when adjacent to or in between digits.

fComplex

Indicates a script that requires complex script handling. If fComplex is false the script contains no combining characters and requires no contextual shaping or reordering.

fNeedsWordBreaking

A script, such as Thai, which requires algorithmic wordbreaking. Use ScriptBreak to obtain a wordbreak points using the standard system wordbreaker.

fNeedsCaretInfo

A script, such as Thai and Indian, where the caret may not be placed inside a cluster. To determine valid caret positions inspect the fCharStop flag in the logical attributes returned by ScriptBreak, or compare adjacent values in the pwLogClust array returned by ScriptShape.

bCharSet

Nominal charset associated with script. May be used in a logfont when creating a font suitable for displaying this script. Note that for new scripts where there is no charset defined, bCharSet may be innapropriate and DEFAULT_CHARSET should be used instead - see the description of fAmbiguousCharSet below.

fControl

contains control characters.

fPrivateUseArea

The Unicode range U+E000 through U+F8FF.

fNeedsCharacterJustify

A script, such as Thai, where justification is conventionally achieved by increasing the space between all letters, not just between words.

fInvalidGlyph

A script for which ScriptShape generates an invalid glyph to represent invalid sequences. The glyph index of the invalid glyph for a particular font may be obtained by calling ScriptGetFontProperties.

fInvalidLogAttr

A script for which ScriptBreak sets the fInvalid flag in the logical attributes to mark invalid sequences.

fCDM

Implies that an item analysed by ScriptItemize included combining diacritical marks (U+0300 through U+36F).

fAmbiguousCharSet

No single legacy charset supports this script. For example the extended Latin Extended-A Unicode range includes characters from the EASTUROPE_CHARSET, the TURKISH_CHARSET and the BALTIC_CHARSET. It also contains characters that are not available in any legacy charset. Use DEFAULT_CHARSET when creating fonts to display parts of this run.

fClusterSizeVaries

A script, such as Arabic, where contextual shaping may cause a string to increase in size when removing characters.

fRejectInvalid

A script, such as Thai, where invalid sequences conventionally cause an editor such as notepad to beep, and ignore keypresses.


ScriptGetProperties

ScriptGetProperties returns the address of a table that maps a script in a SCRIPT_ANALYSIS uScript field to properties including the primary language associated with that script, whether it's numeric and whether it's complex.

HRESULT WINAPI ScriptGetProperties(
const SCRIPT_PROPERTIES***ppSp,//OutReceives pointer to table of pointers to properties indexed by script
int*piNumScripts);//OutReceives number of scripts (valid values are 0 through NumScripts-1)


SCRIPT_FONTPROPERTIES

typedef struct {
int cBytes;// Structure length
WORD wgBlank;// Blank glyph
WORD wgDefault;// Glyph used for Unicode values not present in the font
WORD wgInvalid;// Glyph used for invalid character combinations (especially in Thai)
WORD wgKashida;// Shortest continuous kashida glyph in the font, -1 if doesn't exist
int iKashidaWidth;// Widths of shortest continuous kashida glyph in the font
} SCRIPT_FONTPROPERTIES;


ScriptGetFontProperties

Returns information from the font cache

HRESULT WINAPI ScriptGetFontProperties(
HDChdc,//InOptional (see notes on caching)
SCRIPT_CACHE*psc,//InOutAddress of Cache handle
SCRIPT_FONTPROPERTIES*sfp);//OutReceives properties for this font


ScriptCacheGetHeight

HRESULT WINAPI ScriptCacheGetHeight(
HDChdc,//InOptional (see notes on caching)
SCRIPT_CACHE*psc,//InOutAddress of Cache handle
long*tmHeight);//OutReceives font height in pixels


ScriptStringAnalyse

#defineSSA_PASSWORD0x00000001// Input string contains a single character to be duplicated iLength times
#defineSSA_TAB0x00000002// Expand tabs
#defineSSA_CLIP0x00000004// Clip string at iReqWidth
#defineSSA_FIT0x00000008// Justify string to iReqWidth
#defineSSA_DZWG0x00000010// Provide representation glyphs for control characters
#defineSSA_FALLBACK0x00000020// Use fallback fonts
#defineSSA_BREAK0x00000040// Return break flags (character and word stops)
#defineSSA_GLYPHS0x00000080// Generate glyphs, positions and attributes
#defineSSA_RTL0x00000100// Base embedding level 1
#defineSSA_GCP0x00000200// Return missing glyphs and LogCLust with GetCharacterPlacement conventions
#defineSSA_HOTKEY0x00000400// Replace '&' with underline on subsequent codepoint
#defineSSA_METAFILE0x00000800// Write items with ExtTextOutW Unicode calls, not glyphs
#defineSSA_LINK0x00001000// Apply FE font linking/association to non-complex text
#defineSSA_HIDEHOTKEY0x00002000// Remove first '&' from displayed string
#defineSSA_HOTKEYONLY0x00002400// Display underline only.
#defineSSA_FULLMEASURE0x04000000// Internal - calculate full width and out the number of chars can fit in iReqWidth.
#defineSSA_LPKANSIFALLBACK0x08000000// Internal - enable FallBack for all LPK Ansi calls Except BiDi hDC calls
#defineSSA_PIDX0x10000000// Internal
#defineSSA_LAYOUTRTL0x20000000// Internal - Used when DC is mirrored
#defineSSA_DONTGLYPH0x40000000// Internal - Used only by GDI during metafiling - Use ExtTextOutA for positioning
#defineSSA_NOKASHIDA0x80000000// Internal - Used by GCP to justify the non Arabic glyphs only.

SSA_HOTKEY

Note that SSA_HOTKEY and SSA_HIDEHOTKEY remove the hotkey '&' character from further processing, so functions such as ScriptString_pLogAttr return arrays based on a string which excludes the '&'.


SCRIPT_TABDEF

Defines tabstop positions for ScriptStringAnalyse (ignored unless SSA_TAB passed)

typedef struct tag_SCRIPT_TABDEF {
int cTabStops;// Number of entries in pTabStops array
int iScale;// Scale factor for pTabStops (see below)
int *pTabStops;// Pointer to array of one or more tab stops
int iTabOrigin;// Initial offset for tab stops (logical units)
} SCRIPT_TABDEF;

cTabStops

Number of entries in the pTabStops array. If zero, tabstops are every 8 average character widths. If one, all tabstops are the length of the first entry in pTabStops. If more than one, the first cTabStops are as specified in the pTabStops array, subsequent tabstops are every 8 average characters from the last tabstop in the array.

iScale

Scale factor for iTabOrigin and pTabStops entries. Values are converted to device coordinates by multiplying by iScale then dividing by 4. If values are already in device units, set iScale to 4. If values are in dialog units, set iScale to the average char width of the dialog font. If values are multiples of the average character width for the selected font, set iScale to 0.

pTabStops

Array of cTabStops entries. Each entry specifies a tabstop position. Positive values give nearedge alignment, negative values give faredge alignment.

iTabOrigin

Tabs are considered to start iTabOrigin before the beginning of the string. Helps with multiple tabbed outputs on the same line.


ScriptStringAnalyse

cString - Input string must contain at least one character

hdc - required if SSA_GLYPH requested. Optional for SSA_BREAK. If present the current font in the hdc is inspected and if a symbolic font the character string is treated as a single neutral SCRIPT_UNDEFINED item.

Note that the uBidiLevel field in the initial SCRIPT_STATE value is ignored - the uBidiLevel used is derived from the SSA_RTL flag in combination with the layout of the hdc.

typedef void* SCRIPT_STRING_ANALYSIS;
HRESULT WINAPI ScriptStringAnalyse(
HDChdc,//InDevice context (required)
const void*pString,//InString in 8 or 16 bit characters
intcString,//InLength in characters (Must be at least 1)
intcGlyphs,//InRequired glyph buffer size (default cString*1.5 + 16)
intiCharset,//InCharset if an ANSI string, -1 for a Unicode string
DWORDdwFlags,//InAnalysis required
intiReqWidth,//InRequired width for fit and/or clip
SCRIPT_CONTROL*psControl,//InAnalysis control (optional)
SCRIPT_STATE*psState,//InAnalysis initial state (optional)
const int*piDx,//InRequested logical dx array
SCRIPT_TABDEF*pTabdef,//InTab positions (optional)
const BYTE*pbInClass,//InLegacy GetCharacterPlacement character classifications (deprecated)
SCRIPT_STRING_ANALYSIS*pssa);//OutAnalysis of string


ScriptStringFree - free a string analysis

HRESULT WINAPI ScriptStringFree(
SCRIPT_STRING_ANALYSIS*pssa);//InOutAddress of pointer to analysis


ScriptStringSize

returns a pointer to the size (width and height) of an analysed string

Note that the SIZE pointer remains valid only until the SCRIPT_STRING_ANALYSIS is passed to ScriptStringFree.

const SIZE* WINAPI ScriptString_pSize(
SCRIPT_STRING_ANALYSISssa);


ScriptString_pcOutChars

returns pointer to length of string after clipping (requires SSA_CLIP set)

Note that the int pointer remains valid only until the SCRIPT_STRING_ANALYSIS is passed to ScriptStringFree.

const int* WINAPI ScriptString_pcOutChars(
SCRIPT_STRING_ANALYSISssa);


ScriptString_pLogAttr

returns pointer to logical attributes buffer in a SCRIPT_STRING_ANALYSIS

Note that the buffer pointer remains valid only until the SCRIPT_STRING_ANALYSIS is passed to ScriptStringFree.

The logical attribute array contains *ScriptString_pcOutChars(ssa) entries.

const SCRIPT_LOGATTR* WINAPI ScriptString_pLogAttr(
SCRIPT_STRING_ANALYSISssa);


ScriptStringGetOrder

Creates an array mapping original character position to glyph position.

Treats clusters as they were in legacy systems - Unless a cluster contains more glyphs than codepoints, each glyph is referenced at least once from the puOrder array.

Requires SSA_GLYPHS requested in original ScriptStringAnalyse call.

The puOrder parameter should address a buffer containing room for at least *ScriptString_pcOutChars(ssa) ints.

HRESULT WINAPI ScriptStringGetOrder(
SCRIPT_STRING_ANALYSISssa,
UINT*puOrder);


ScriptStringCPtoX

Return x coordinate for leading or trailing edge of character icp.

HRESULT WINAPI ScriptStringCPtoX(
SCRIPT_STRING_ANALYSISssa,//InString analysis
inticp,//InCaret character position
BOOLfTrailing,//InWhich edge of icp
int*pX);//OutCorresponding x offset


ScriptStringXtoCP

HRESULT WINAPI ScriptStringXtoCP(
SCRIPT_STRING_ANALYSISssa,//In
intiX,//In
int*piCh,//Out
int*piTrailing);//Out


ScriptStringGetLogicalWidths

Converts visual withs in psa->piAdvance into logical widths, one per original character, in logical order.

Requires SSA_GLYPHS requested in original ScriptStringAnalyse call.

The piDx parameter should address a buffer containing room for at least *ScriptString_pcOutChars(ssa) ints.

HRESULT WINAPI ScriptStringGetLogicalWidths(
SCRIPT_STRING_ANALYSISssa,
int*piDx);


ScriptStringValidate

Scans the string analysis for invalid glyphs.

Only glyphs generated by scripts that can generate invalid glyphs are scanned.

returns S_OK - no invalid glyphs are present S_FALSE - one or more invalid glyphs are present

HRESULT WINAPI ScriptStringValidate(
SCRIPT_STRING_ANALYSISssa);


ScriptStringOut

Displays the string generated by a prior ScriptStringAnalyze call, then optionally adds highlighting corresponding to a logical selection.

Requires SSA_GLYPHS requested in original ScriptStringAnalyse call.

HRESULT WINAPI ScriptStringOut(
SCRIPT_STRING_ANALYSISssa,//InAnalysis with glyphs
intiX,//In
intiY,//In
UINTuOptions,//InExtTextOut options
const RECT*prc,//InClipping rectangle (iff ETO_CLIPPED)
intiMinSel,//InLogical selection. Set iMinSel>=iMaxSel for no selection
intiMaxSel,//In
BOOLfDisabled);//InIf disabled, only the background is highlighted.


uOptions may nclude only ETO_CLIPPED or ETO_OPAQUE.


ScriptIsComplex

Determines whether a Unicode string requires complex script processing

The dwFlags parameter may include the following requests

#defineSIC_COMPLEX1// Treat complex script letters as complex
#defineSIC_ASCIIDIGIT2// Treat digits U+0030 through U+0039 as complex
#defineSIC_NEUTRAL4// Treat neutrals as complex

SIC_COMPLEX: Should normally set. Causes complex script letters to be treated as complex.

SIC_ASCIIDIGIT: Set this flag if the string would be displayed with digit substitution enabled. If you are following the users NLS settings using the ScriptRecordDigitSubstitution API, you can pass scriptDigitSubstitute.DigitSubstitute != SCRIPT_DIGITSUBSTITUTE_NONE.

SIC_NEUTRAL: Set this flag if you may be displaying the string with right-to-left reading order. When this flag is set, neutral characters are considered as complex.

Returns S_OK if string requires complex script processing, S_FALSE if string contains only characters laid out side by side from left to right.

HRESULT WINAPI ScriptIsComplex(
const WCHAR*pwcInChars,//InString to be tested
intcInChars,//InLength in characters
DWORDdwFlags);//InFlags (see above)


ScriptRecordDigitSubstitution

Reads NLS native digit and digit substitution settings and records them in the SCRIPT_DIGITSUBSTITUTE structure.

typedef struct tag_SCRIPT_DIGITSUBSTITUTE {
DWORD NationalDigitLanguage:16;// Language for native substitution
DWORD TraditionalDigitLanguage:16;// Language for traditional substitution
DWORD DigitSubstitute:8;// Substitution type
DWORD dwReserved;// Reserved
} SCRIPT_DIGITSUBSTITUTE;

NationalDigitLanguage

Standard digits for the selected locale as defined by the countries standard setting authority.

TraditionalDigitLangauge

Digits originally used with the locales script.

DigitSubstitute

Selects between None, Context, National and Traditional. See ScriptApplyDigitSubstitution below for constant definitions.

Although most complex scripts have their own associated digits, many countries using those scripts use western (so called 'Arabic') digits as their standard. NationalDigitLanguage reflects the digits used as standard, and is set from the NLS data for the locale. On Windows 2000 the national digit langauge can be adjusted to any digit script with the control panel/regional options/numbers/Standard digits listbox.

The TraditionalDigitLanguage for a locale is derived directly from the script used by that locale.

HRESULT WINAPI ScriptRecordDigitSubstitution(
LCIDLocale,//InLOCALE_USER_DEFAULT or desired locale
SCRIPT_DIGITSUBSTITUTE*psds);//OutDigit substitution settings


Locale

NLS locale to be queried. Should usually be set to LOCALE_USER_DEFAULT. Alternatively may be passed as a locale combined with LOCALE_NOUSEROVERRIDE to obtain default settings for a given locale. Note that context digit substitution is supported only in ARABIC and FARSI locales. In other locales, context digit is mapped to no substitution.

psds

Pointer to SCRIPT_DIGITSUBSTITUTE. This structure may be passed later to ScriptApplyDigitSubstitution.

returns

E_INVALIDARG if Locale is invalid or not installed. E_POINTER if psds is NULL. Otherwise S_OK.

For performance reasons, you should not call ScriptRecordDigitSubstitution frequently. In particular it would be a considerable overhead to call it every time you call ScriptItemize or ScriptStringAnalyse.

Instead, you may choose to save the SCRIPT_DIGITSUBSTITUTE structure, and update it only when you receive a WM_SETTINGCHANGE message or when a RegNotifyChangeKeyValue call in a dedicated thread indicates a change in the registry under HKCU\Control Panel\\International.

The normal way to call this function is simply

SCRIPT_DIGITSUBSTITUTE sds;
ScriptRecordDigitSubstitution(LOCALE_USER_DEFAULT, &sds);

Then every time you itemize, you'd use the results like this:

SCRIPT_CONTROL  sc = {0};
SCRIPT_STATE    ss = {0};
ScriptApplyDigitSubstitution(&sds, &sc, &ss);


ScriptApplyDigitSubstitution

Aplies the digit substitution settings recorded in a SCRIPT_DIGIT_SUBSTITUTE structure to the SCRIPT_CONTROL and SCRIPT_STATE structures.

The DigitSubstitute field of the SCRIPT_DIGITSUBSTITUTE structure is normally set by ScriptRecordDigitSubstitution, however it may be replaced by any one of the following values:

#defineSCRIPT_DIGITSUBSTITUTE_CONTEXT0// Substitute to match preceeding letters
#defineSCRIPT_DIGITSUBSTITUTE_NONE1// No substitution
#defineSCRIPT_DIGITSUBSTITUTE_NATIONAL2// Substitute with official national digits
#defineSCRIPT_DIGITSUBSTITUTE_TRADITIONAL3// Substitute with traditional digits of the locale

SCRIPT_DIGITSUBSTITUTE_CONTEXT

Digits U+0030 - U+0039 will be substituted according to the language of prior letters. Before any letters, digits will be substituted according to the TraditionalDigitLangauge field of the SCRIPT_DIGIT_SUBSTITUTE structure. This field is normally set to the primary language of the Locale passed to ScriptRecordDigitSubstitution.

SCRIPT_DIGITSUBSTITUTE_NONE

Digits will not be substituted. Unicode values U+0030 to U+0039 will be displayed with Arabic (i.e. Western) numerals.

SCRIPT_DIGITSUBSTITUTE_NATIONAL

Digits U+0030 - U+0039 will be substituted according to the NationalDigitLangauge field of the SCRIPT_DIGIT_SUBSTITUTE structure. This field is normally set to the national digits returned for the NLS LCTYPE LOCALE_SNATIVEDIGITS by ScriptRecordDigitSubstitution.

SCRIPT_DIGITSUBSTITUTE_TRADITIONAL

Digits U+0030 - U+0039 will be substituted according to the TraditionalDigitLangauge field of the SCRIPT_DIGIT_SUBSTITUTE structure. This field is normally set to the primary language of the Locale passed to ScriptRecordDigitSubstitution.

HRESULT WINAPI ScriptApplyDigitSubstitution(
const SCRIPT_DIGITSUBSTITUTE*psds,//InDigit substitution settings
SCRIPT_CONTROL*psc,//OutScript control structure
SCRIPT_STATE*pss);//OutScript state structure


psds

Pointer to SCRIPT_DIGITSUBSTITUTE structure recorded earlier. If NULL, ScriptApplyDigitSubstitution calls ScriptRecordDigitSubstitution with LOCALE_USER_DEFAULT.

psc

SCRIPT_CONTROL structure. The fContextDigits and uDefaultLanguage fields will be updated.

pss

SCRIPT_CONTROL structure. The fDigitSubstitute field will be updated.

returns

E_INVALIDARG if the DigitSubstitute field of the SCRIPT_DIGITSUBSTITUTE structure is unrecognised, else S_OK;


Index

  • Character clusters
  • Commonly used caret positioning conventions
  • Displaying the caret in bidi strings
  • Justification
  • Notes on ScriptXtoCP and ScriptCPtoX
  • Relationship between caret positions, justifications points and clusters
  • SCRIPT
  • SCRIPT_ANALYSIS
  • SCRIPT_CACHE
  • SCRIPT_CONTROL
  • SCRIPT_FONTPROPERTIES
  • SCRIPT_ITEM
  • SCRIPT_JUSTIFY
  • SCRIPT_LOGATTR
  • SCRIPT_PROPERTIES
  • SCRIPT_STATE
  • SCRIPT_TABDEF
  • SCRIPT_VISATTR
  • ScriptApplyDigitSubstitution
  • ScriptApplyLogicalWidth
  • ScriptBreak
  • ScriptCPtoX
  • ScriptCacheGetHeight
  • ScriptFreeCache
  • ScriptGetCMap
  • ScriptGetFontProperties
  • ScriptGetGlyphABCWidth
  • ScriptGetLogicalWidths
  • ScriptGetProperties
  • ScriptIsComplex
  • ScriptItemize - break text into items
  • ScriptJustify
  • ScriptLayout
  • ScriptPlace
  • ScriptRecordDigitSubstitution
  • ScriptShape
  • ScriptStringAnalyse
  • ScriptStringCPtoX
  • ScriptStringFree - free a string analysis
  • ScriptStringGetLogicalWidths
  • ScriptStringGetOrder
  • ScriptStringOut
  • ScriptStringSize
  • ScriptStringValidate
  • ScriptStringXtoCP
  • ScriptString_pLogAttr
  • ScriptString_pcOutChars
  • ScriptTextOut
  • ScriptXtoCP
  • Script specific processing
  • Translating mouse hit 'x' offset to caret position
  • USP - Unicode Complex Script processor
  • USP Status Codes
  • Uniscribe build number
  • Word break points


  • this page was last updated 8 November 1999
    © 1999 Microsoft Corporation. All rights reserved. Terms of use.
    comments to the MST group: how to contact us

     

    Introduction | Uniscribe APIs
    Microsoft Typography | Developer | Uniscribe