| Microsoft Typography | Developer | Uniscribe | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Introduction | Uniscribe APIs | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The Index is located at the bottom of this document.
USPBUILD 0325
Copyright (c) 1996-9, Microsoft Corporation. All rights reserved. The SCRIPT enum is an opaque type used internally to identify which shaping engine functions are used to process a given run. #define SCRIPT_UNDEFINED 0 This is the only public script ordinal. May be forced into the eScript field of a SCRIPT_ANALYSIS to disable shaping. SCRIPT_UNDEFINED is supported by all fonts - ScriptShape will display whatever glyph is defined in the font CMAP table, or, if none, the missing glyph.
Many script APIs take a combination of HDC and SCRIPT_CACHE parameter. A SCRIPT_CACHE is an opaque pointer to a Uniscribe font metric cache structure.
The client must allocate and retain one SCRIPT_CACHE variable for each character style used. It must be initialised by the client to NULL. APIs are passed an HDC and the address of a SCRIPT_CACHE variable. Uniscribe will first attempt to access font data via the SCRIPT_CACHE and will only inspect the HDC if the required data is not already cached. The HDC may be passed as NULL. If data required by Uniscribe is already cached, the HDC won't be accessed and operation continues normally. If the HDC is passed as NULL, and Uniscribe needs to access it for any reason, Uniscribe will return E_PENDING. E_PENDING is returned quickly, allowing the client to avoid time consuming SelectObject calls. The following example applies to all APIs that take a SCRIPT_CACHE and an optional HDC. hr = ScriptShape(NULL, &sc, ..);
if (hr == E_PENDING) {
... select font into hdc ...
hr = ScriptShape(hdc, &sc, ...);
}
The client may free a SCRIPT_CACHE at any time. Uniscribe maintains reference counts in it's font and shaper caches, and frees font data only when all sizes of the font are free, and shaper data only when all fonts it supports are freed. The client should free the SCRIPT_CACHE for a style when it discards that style. ScriptFreeCache always sets it's parameter to NULL to help avoid mis-referencing.
The SCRIPT_CONTROL structure provides itemization control flags to the ScriptItemize function.
Language to use when Unicode values are ambiguous. Used by numeric processing to select digit shape when fDigitSubstitute (see SCRIPT_STATE) is in force. Specifies that national digits are chosen according to the nearest previous strong text, rather than using uDefaultLanguage. By default text at the start of the string is laid out as if it follows strong text of the same direction as the base embedding level. Set fInvertPreBoundDir to change the initial context to the opposite of the base embedding level. This flag is for GetCharacterPlacement legacy support. By default text at the end of the string is laid out as if it preceeds strong text of the same direction as the base embedding level. Set fInvertPostBoundDir to change the final context to the opposite of the base embedding level. This flag is for GetCharacterPlacement legacy support. Causes the first character of the string to be shaped as if were joined to a previous character. Causes the last character of the string to be shaped as if were joined to a following character. Causes all neutral characters in the string to be treated as if they were strong characters of their enclosing embedding level. This effectively locks neutrals in place, reordering occuring only between neutrals. Causes all numeric characters in the string to be treated as if they were strong characters of their enclosing embedding level. This effectively locks numerics in place, reordering occuring only between numerics. Reserved. Always initialise to 0. The SCRIPT_STATE structure is used both to initialise the unicode algorithm state as an input parameter to ScriptItemize, and is also a component of each item analysis returned by ScriptItemize.
The embedding level associated with all characters in this run according to the Unicode bidi algorithm. When passed to ScriptItemize, should be initialised to 0 for an LTR base embedding level, or 1 for RTL. TRUE if this level is an override level (LRO/RLO). In an override level, characters are layed out purely left to right, or purely right to left. No reordering of digits or strong characters of opposing direction takes place. Note that this initial value is reset by LRE, RLE, LRO or RLO codes in the string. TRUE if the shaping engine is to bypass mirroring of Unicode Mirrored glyphs such as brackets. Set by Unicode character ISS, cleared by ASS. TRUE if character codes in the Arabic Presentation Forms areas of Unicode should be shaped. (Not implemented). TRUE if character codes U+0030 through U+0039 (European digits) are to be substituted by national digits. Set by Unicode NADS, Cleared by NODS. TRUE if ligatures are not to be used in the shaping of Arabic or Hebrew characters. TRUE if control characters are to be shaped as representational glyphs. (Normally, control characters are shaped to the blank glyph and given a width of zero). TRUE indicates prior strong characters were Arabic for the purposes of rule P0 on page 3-19 of 'The Unicode Standard, version 2.0'. Should normally be set TRUE before itemizing an RTL paragraph in an Arabic language, FALSE otherwise. For GetCharaterPlacement legacy support only. Initialise to TRUE to request ScriptShape to generate the LogClust array the same way as GetCharacterPlacement does in Arabic and Hebrew Windows95. Affects only Arabic and Hebrew items. Reserved. Always initialise to 0. Reserved. Always initialise to 0. Each analysed item is described by a SCRIPT_ANALYSIS structure. It also includes a copy of the Unicode algorithm state (SCRIPT_STATE).
Opaque value identifying which engine Uniscribe will use to Shape, Place and TextOut this item. The value of eScript is undefined, and will change in future releases, but attributes of eScript may be obtained by calling ScriptGetProperties. Rendering direction. Normally identical to the parity of the Unicode embedding level, but may differ if overridden by GetCharacterPlacement legacy support. Logical direction - whether conceptually part of a left-to-right sequenece or a right-to-left sequence. Although this is usually the same as fRTL, for a number in a right-to-left run, fRTL is False (because digits are always displayed LTR), but fLayoutRTL is True (because the number is read as part of the right-to-left sequence). If set, the shaping engine will shape the first character of this item as if it were joining with a previous character. Set by ScriptItemize, may be overriden before calling ScriptShape. If set, the shaping engine will shape the last character of this item as if it were joining with a subsequient character. Set by ScriptItemize, may be overriden before calling ScriptShape. If set, the shaping engine will generate all glyph related arrays in logical order. By default glyph related arrays are in visual order, the first array entry corresponding to the leftmost glyph. Set to FALSE by ScriptItemize, may be overriden before calling ScriptShape. May be set TRUE on input to ScriptShape to disable use of glyphs for this item. Additionally, ScriptShape will set it TRUE for hdcs containing symbolic, unrecognised and device fonts. Disabling glyphing disables complex script shaping. When set, shaping and placing for this item is implemented directly by calls to GetTextExtentExPoint and ExtTextOut. The SCRIPT_ITEM structure includes a SCRIPT_ANALYSIS with the string ofset of the first character of the item.
Offset from beginning of itemised string to first character of this item, counted in Unicode codepoints (i.e. words). Script analysis structure containing analysis specific to this item, to be passed to ScriptShape, ScriptPlace etc.
Breaks a run of unicode into individually shapeable items. Items are delimited by
The client may create multiple runs from each item returned by ScriptItemize, but should not combine multiple items into a single run. Later the client will call ScriptShape for each run (when measuring or rendering), and must pass the SCRIPT_ANALYSIS that ScriptItemize returned.
Returns E_INVALIDARG if pwcInChars == NULL or cInChars == 0 or pItems == NULL or cMaxItems < 2. Returns E_OUTOFMEMORY if the output buffer length (cMaxItems) is insufficient. Note that in this case, as in all error cases, no items have been fully processed so no part of the output array contains defined values. If psControl and psState are NULL on entry, ScriptItemize breaks the unicode string purely by character code. If they are all non-null, it performs a full Unicode bidi analysis. ScriptItemize always adds a terminal item to the item analysis array (pItems) such that the length of an item at pItem is always available as: pItem[1].iCharPos - pItem[0].iCharPos For this reason, it is invalid to call ScriptItemize with a buffer of less than two SCRIPT_ANALYSIS items. To perform a correct Unicode Bidi analysis, the SCRIPT_STATE should be initialised according to the paragraph reading order at paragraph start, and ScriptItemize should be passed the whole paragraph. fRTL and fNumeric together provide the same classification as the lpClass output from GetCharacterPlacement. European digits U+0030 through U+0039 may be rendered as national digits as follows:
For fContextDigits, any Western digits (U+0030 - U+0039) encountered before the first strongly directed character are substituted by the traditional digits of the SCRIPT_CONTROL.uDefaultLanguage when that language is written in the same direction as SCRIPT_STATE.uBidiLevel. Thus, in a right-to-left string, if SCRIPT_CONTROL.uDefaultLanguage is 1 (LANG_ARABIC), then leading Western digits will be substituted by traditional Arabic digits. However, also in a right-to-left string, if SCRIPT_CONTROL.uDefaultLanguage is 0x1e (LANG_THAI), then no substitution occurs on leading Western digits because the Thai language is written left-to-right. Following strongly directed characters, digits are substituted by the traditional digits associated with the closest prior strongly directed character. The left-to-right mark (LRM) and right-to-left mark (RLM) are strong characters whose language depends on the SCRIPT_CONTROL.uDefaultLangauge. If SCRIPT_CONTROL.uDefaultLangauge is a left-to-right langauge, then LRM causes subsequent Western digits to be substituted by the traditional digits associated with that language, while Western digits following RLM are not substituted. Conversly, if SCRIPT_CONTROL.uDefaultLangauge is a right-to-left langauge, then Western digits following LRM are not substituted, while Western digits following RLM are substituted by the traditional digits associated with that language. Effect of Unicode control characters on SCRIPT_STATE:
SCRIPT_STATE.fArabicNumContext controls the Unicode EN->AN rule. It should normally be initialised to TRUE before itemizing an RTL paragraph in an Arabic language, FALSE otherwise. The ScriptLayout function converts an array of run embedding levels to a map of visual to logical position, and/or logical to visual position. pbLevel must contain the embedding levels for all runs on the line, ordered logically. On output, piVisualToLogical[0] is the logical index of the run to display at the far left. Subsequent entries should be displayed progressing from left to right. piLogicalToVisual[0] is the relative visual position where the first logical run should be displayed - the leftmost display position being zero. The caller may request either piLogicalToVisual or piVisualToLogical or both. Note: No other input is required since the embedding levels give all necessary information for layout.
The script justification enumeration provides the client with the glyph characteristic information it needs to implement justification.
The visual (glyph) attribute buffer generated by ScriptShape identifies clusters and justification points:
Justification class for this glyph. See SCRIPT_JUSTIFY. Set for the logically first glyph in every cluster, even for clusters containing just one glyph. Set for glyphs that combine with base characters. Set by the shaping engine for some, but not all, zero width characters. The ScriptShape function takes a Unicode run and generates glyphs and visual attributes. The number of glyphs generated varies according to the script and the font. Only for simple scripts and fonts does each Unicode code point generates a single glyph. There is no limit on the number of glyphs generated by a codepoint. For example, a sophisticated complex script font might choose to constuct characters from components, and so generate many times as many glyphs as characters. There are also special cases like invalid character representations, where extra glyphs are added to represent the invalid sequence. A reasonable guess might be to provide a glyph buffer 1.5 times the length of the character buffer, plus a 16 glyph fixed addition for rare cases like invalid sequenece representation. If ScriptShape returns E_OUTOFMEMORY it will be necessary to recall it, possibly more than once, until a large enough buffer is found.
Returns E_OUTOFMEMORY if the output buffer length (cMaxGlyphs) is insufficient. Note that in this case, as in all error cases, the content of the output array is undefined. Clusters are sequenced uniformly within the run, as are glyphs within the cluster - the fRTL item flag (from ScriptItemize) identifies whether left to right, or right to left. ScriptShape may set the fNoGlyphIndex flag in psa if the font or OS cannot support glyph indices. If fLogicalOrder is requested in psa, glyphs will be always be generated in the same order as the original Unicode characters. If fLogicalOrder is not set, right to left items are generated in reverse order, so ScriptTextOut does not need to reverse them before calling ExtTextOut. The ScriptPlace function takes the output of a ScriptShape call and generates glyph advance width and 2D offset information. The composite ABC width for the whole item identifies how much the glyphs overhang to the left of the start position and to the right of the length implied by the sum of the advance widths. The total advance width of the line is exactly abcA + abcB + abcC. abcA and abcC are maintained internally by Uniscribe as proportions of the cell height represented in 8 bits and are thus roughly +/- 1%. The total width returned (as the sum of piAdvance, and as the sum of abcA+abcB+abcC) is accurate to the resolution of the TrueType shaping engine. All glyph related arrays are in visual order unless the fLogicalOrder flag is set in psa.
The ScriptTextOut function takes the output of both ScriptShape and ScriptPlace calls and calls the operating system ExtTextOut function appropriately. All arrays are in visual order unless the fLogicalOrder flag is set in psa.
The caller should normally use SetTextAlign(hdc, TA_RIGHT) before calling ScriptTextOut with an RTL item inlogical order. The piJustify array provides requested cell widths for each glyph. When the piJustify width of a glyph differs from the unjustified width (in PiAdvance), space is added to or removed from the glyph cell at it's trailing edge. The glyph is always aligned with the leading edge of it's cell. (This rule applies even in visual order.) When a glyph cell is extended the extra space is uaually made up by the addition of white space, however for Arabic scripts, the extra space is made up by one or more kashida glyphs, unless the extra space is insufficient for the shortest kashida glyph in the font. (The width of the shortest kashida is available by calling ScriptGetFontProperties.) piJustify should only be passed if re-justification of the string is required. Normally pass NULL to this parameter. fuOptions may contain ETO_CLIPPED or ETO_OPAQUE (or neither or both). Do not use ScriptTextOut to write to a metafile unless you are sure that the metafile will eventually be played back without any font substitution. ScriptTextOut record glyph numbers in the metafile. Since glyph numbers vary considerably from one font to another such a metafile is unlikely to play back correctly when differant fonts are substituted. For example when a metafile is played back at a different scale CreateFont requests recorded in the metafile may resolve to bitmap instead of truetype fonts, or if the metafile is played back on a different machine requested fonts may not be installed.// To write complex scripts in a metafile in a font independant manner, use ExtTextOut to write the logical characters directly, so that glyph generation and placement does not occur until the text is played back. ScriptJustify provides a simple minded implementation of multilingual justification. Sophisticated text formatters may prefer to generate their own delta dx array by combining their own features with the information returned by ScriptShape in the SCRIPT_VISATTR array. ScriptJustify establishes how much adjustment to make at each glyph position on the line. It interprets the SCRIPT_VISATTR array generated by a call to ScriptShape, and gives top priority to kashida, then uses inter word spacing if there's no kashida points, then uses intercharacter spacing if there are no inter-word points. The justified advance widths generated in ScriptJustify should be passed to ScriptTextOut in the piJustify paramter. ScriptJustify creates a justify array containing updated advance widths for each glyph. Where a glyphs advance width is increased, it is expected that the extra width will be rendered to the right of the glyph, with as white space or, for Arabic text, as kashida.
The SCRIPT_LOGATTR structure describes attributes of logical characters useful when editing and formatting text. Note that for wordbreaking and linebreaking, if the first character of the run passed in is not whitespace, the client needs to check whether the last character of the previous run is whitespace to determine if the first character of this run is the start of a word.
It would be valid to break the line in front of this character. This flag is set on the first character of South-East Asian words. Note that when linebreaking the client would usually also treat any nonblank following a blank as a softbreak position, by inspecting the fWhiteSPace flag below. This character is one of the many Unicode character that are classified as breakable whitespace. Valid cursor position. Set on most characters, but not on codepoints inside Indian and South East Asian character clusters. May be used to implement left and right arrow operation in editors. Valid position following word advance/retire commonly implemented at ctrl/left-arrow and ctrl/right-arrow. May be used to implement ctrl+left and ctrl+right arrow operation in editors. As with fSoftBreak clients should normally also inspect the fWhiteSpace flag and treat the first character after a run of whitespace as the start of a word. Marks characters which form an invalid or undisplayable combination. Scripts which can set this flag have the flag fInvalidLogAttr set in their SCRIPT_PROPERTIES. The ScriptBreak function returns cursor movement and formatting break positions for an item as an array of SCRIPT_LOGATTRs. To support mixed formatting within a single word correctly, ScriptBreak should be passed whole items as returned by ScriptItemize. ScriptBreak does not require an hdc and does not execute glyph shaping. The fCharStop flag marks cluster boundaries for those scripts where it is conventional to restrict from moving inside clusters. The same boundaries could also be inferred by inspecting the pLogCLust array returned by ScriptShape, however ScriptBreak is considerably faster in implementation and does not require an hdc to be prepared. The fWordStop, fSoftBreak and fWhiteSpace flags are only available through ScriptBreak. Most shaping engines that identify invalid sequences do so by setting the fInvalid flag in ScriptBreak. The fInvalidLogAttr flag in ScriptProperties identifies which scripts do this.
The ScriptCPtoX function returns the x offset from the left end (!fLogical) or leading edge (fLogical) of a run to either the leading or the trailing edge of a logical character cluster. iCP is the offset of any logical character in the cluster. For scripts where the caret may conventionally be placed into the middle of clusters (e.g. Arabic, Hebrew), the returned X may be an interpolated position for any codepoint in the line. For scripts where the caret is conventionally snapped to the boundaries of clusters, (e.g. Thai, Indian), the resulting X position will be snapped to the requested edge of the cluster containing CP.
The ScriptXtoCP function converts an x offset from the left end (!fLogical) or leading edge (fLogical) of a run to a logical character position and a flag that indicates whether the X position fell in the leading or the trailing half of the character. For scripts where the cursor may conventionally be placed into the middle of clusters (e.g. Arabic, Hebrew), the returned CP may be for any codepoint in the line, and fTrailing will be either zero or one. For scripts where the cursor is conventionally snapped to the boundaries of a cluster, the returned CP is always the position of the logically first codepoint in a cluster, and fTrailing is either zero, or the number of codepoints in the cluster. Thus the appropriate cursor position for a mouse hit is always the returned CP plus the value of fTrailing. If the X positition passed is not in the item at all, the resulting position will be the trailing edge of character -1 (for X positions before the item), or the leading edge of character 'cChars' (for X positions following the item).
Character clusters are glyph sequences that cannot be split between lines. Some languages (e.g. Thai, Indic) restrict caret placement to points betwen clusters. This applies both to keyboard initiated caret movement (e.g. cursor keys) and pointing and clicking with the mouse (hit testing). Uniscribe provides cluster information in both the visual and logical attributes. If you've called ScriptShape you'll find the cluster information represented both by sequences of the same value in the pwLogClust array, and by the fClusterStart flag in the psva SCRIPT_VISATTR array. ScriptBreak also returns the fCharStop flag in the SCRIPT_LOGATTR array to identify cluster positions.
Valid positions for moving the caret when moving in whole words are marked by the fWordStop flag returned by ScriptBreak. Valid positions for breaking lines between words are marked by the fSoftBreak flag returned by ScriptBreak.
Justification space or kashida should be inserted where identified by the uJustificaion field of the SCRIPT_VISATTR. When performing inter-character justification, insert extra space only after glyphs marked with uJustify == SCRIPT_JUSTIFY_CHARACTER.
Uniscribe provides information about special processing for each script in the SCRIPT_PROPERTIES array. Use the following code during initialisation to get a pointer to the SCRIPT_PROPERTIES array: const SCRIPT_PROPERTIES **g_ppScriptProperties; // Array of pointers to properties int iMaxScript; HRESULT hr; hr = ScriptGetProperties(&g_ppScriptProperties, &g_iMaxScript); Then inspect the properties of the script of an item 'iItem' as follows: hr = ScriptItemize( ... , pItems, ... );
...
if (g_ppScriptProperties[pItems[iItem].a.eScript]
->fNeedsCaretInfo)
{ // Use ScriptBreak to restrict the
caret from entering clusters (for example). }
SCRIPT_PROPERTIES.fNeedsCaretInfo Caret placement should be restricted to cluster edges for scripts such as Thai and Indian. The fNeedsCaretInfo flag in SCRIPT_PROPERTIES identifies such languages. Note that ScriptXtoCP and ScriptCPtoX automatically apply caret placement restictions. SCRIPT_PROPERTIES.fNeedsWordBreaking For most scripts, word break placement may be identified by scanning for characters marked as fWhiteSpace in SCRIPT_LOGATTR, or for glyphs marked as uJustify == SCRIPT_JUSTIFY_BLANK or SCRIPT_JUSTIFY_ARABIC_BLANK in SCRIPT_VISATTR. For languages such as Thai, it is also necessary to call ScriptBreak, and include character positions marked as fWordStop in SCRIPT_LOGATTR. Such scripts are marked as fNeedsWordbreaking in SCRIPT_PROPERTIES. SCRIPT_PROPERTIES.fNeedsCharacterJustify Languages such as Thai also require inter-character spacing when justifying (where uJustify == SCRIPT_JUSTIFY_CHARACTER in the SCRIPT_VISATTR). Such languages are marked as fNeedsCharacterJustify in SCRIPT_PROPERTIES. SCRIPT_PROPERTIES.fAmbiguousCharSet Many Uniscribe scripts do not correspond directly to 8 bit character sets. For example Unicode characters in the range U+100 through U+024F represent extended latin shapes used for many languages, including those supported by EASTEUROPE_CHARSET, TURKISH_CHARSET and VIETNAMESE_CHARSET. However many of these characters are supported by more han one of thsese charsets. fAmbiguousCharset is set for any script token which could contain characters from a number of these charsets. In these cases the bCharSet field may contain ANSI_CHARSET or DEFAULT_CHARSET. The Uniscribe client will generally need to apply futher processing to determine which charset to use when requesting a font suitable for this run. For example it determine that the run consists of multiple languages and split it up to use a different font for each language.
Both functions work only within runs and require the results of a previous ScriptShape call. The client must establish which run a given cursor offset or x position is within before passing it to ScriptCPtoX or ScriptXtoCP. Cluster information in the logical cluster array is used to share the width of a cluster of glyphs equally among the logical characters they represent. For example, the lam alif glyph is divided into four areas: the leading half of the lam, the trailing half of the lam, the leading half of the alif and the trailing half of the alif. ScriptXtoCP Understands the caret position conventions of each script. For Indian and Thai, caret positions are snapped to cluster boundaries, for Arabic and Hebrew, caret positions are interpolated within clusters.
Translating mouse hit 'x' offset to caret position Conventionally, caret position 'cp' may be selected by clicking either on the trailing half of character 'cp-1' or on the leading half of character 'cp'. This may easily be implemented as follows: int iCharPos; int iCaretPos int fTrailing; ScriptXtoCP(iMouseX, ..., &iCharPos, &fTrailing); iCaretPos = iCharPos + fTrailing; For scripts that snap the caret to cluster boundaries, ScriptXtoCP returns ftrailing set to either 0, or the width of the cluster in codepoints. Thus the above code correctly returns only valid caret positions.
Displaying the caret in bidi strings In unidirectional text, the leading edge of a character is at the same place as the trailing edge of the previous character, so there is no ambiguity in placing the caret between characters. In bidirectional text, the caret position between runs of opposing direction may be ambiguous. For example in the left to right paragraph 'helloMAALAS', the last letter of 'hello' immediately preceeds the first letter of 'salaam'. The best position to display the caret depends on whether it is considered to follow the 'o' of 'hello', or to preceed the 's' of 'salaam'.
Commonly used caret positioning conventions
The caret may be positioned as follows: if (advancing) {
ScriptCPtoX(iCharPos-1, TRUE, ..., &iCaretX);
} else {
ScriptCPtoX(iCharPos, FALSE, ..., &iCaretX);
}
Or, more simply, given an fAdvancing BOOL restricted to TRUE or FALSE: ScriptCPtoX(iCharPos-fAdvancing, fAdvancing, ..., &iCaretX); ScriptCPtoX handles out of range positions logically: it returns the leading edge of the run for iCharPos <0, and the trailing edge of the run for iCharPos >=length. Converts visual withs in piAdvance into logical widths, one per original character, in logical order. Ligature glyphs widths are divided evenly amongst the characters they represent.
ScriptGetLogicalWidths is useful for recording widths in a font independant manner. By passing the recorded logical widths to ScriptApplyLogicalWidths, a block of text can be replayed in the same boundaries with acceptable loss of quality even when the original font is not available. Accepts an array of advance widths in logical order, corresponding one to one with codepoints, and generates an array of glyph widths suitable for passing to the piJustify parameter of ScriptTextOut. ScriptApplyLogicalWidth may be used to reapply logical widths obtained with ScriptGetLogicalWidths. It may be useful in situations such as metafiling, where it is necessary to record and reapply advance width information in a font independant manner.
Pointer to an array of dx widths in logical order, one per codepoint. Count of the logical codepoints in the run. Glyph count. Pointer to an array of logical clusters from ScriptShape Pointer to an array of visual attributes from ScriptShape and updated by ScriptPlace. Pointer to an array of glyph advance widths from ScriptPlace. Pointer to a SCRIPT_ANALYSIS structure from ScriptItemize and updated by ScriptShape and SriptPlace.. Pointer to the run overall ABC width (optional). If present, when the function is called, it should contain the run ABC width returned by ScriptPlace; when the function returns, the ABC width has been updated to match the new widths. Pointer to an array of the resulting glyph advance widths. This is suitable for passing to the piJustify parameter of ScriptTextOut. ScriptGetCMap may be used to determine which characters in a run are supported by the selected font. It returns glyph indices of Unicode characters according to Truetype Cmap table, or standard Cmap implemented for old style fonts. The glyph indices are returned in the same order as the input string. The caller may scan the returned glyph buffer looking for the default glyph to determine which characters are not available. (The default glyph index for the selected font should be determined by calling ScriptGetFontProperties). The return value indicates the presence of any missing glyphs.
returns S_OK - All unicode codepoints were present in the font S_FALSE - Some of the Unicode codepoints were mapped to the default glyph E_HANDLE - font or system does not support glyph indices Returns ABC width of a given glyph. May be useful for drawing glyph charts. Should not be used for run of the mill complex script text formatting.
returns S_OK - Glyph width returned E_HANDLE - font or system does not support glyph indices
Language associated with this script. When a script is used for many languages, langid id represents a default language. For example, Western script is represented by LANG_ENGLISH although it is also used for French, German, Spanish etc. Script contains numerics and characters used in conjunction with numerics by the rules of the Unicode bidirectional algorithm. For example dollar sign and period are classified as numeric when adjacent to or in between digits. Indicates a script that requires complex script handling. If fComplex is false the script contains no combining characters and requires no contextual shaping or reordering. A script, such as Thai, which requires algorithmic wordbreaking. Use ScriptBreak to obtain a wordbreak points using the standard system wordbreaker. A script, such as Thai and Indian, where the caret may not be placed inside a cluster. To determine valid caret positions inspect the fCharStop flag in the logical attributes returned by ScriptBreak, or compare adjacent values in the pwLogClust array returned by ScriptShape. Nominal charset associated with script. May be used in a logfont when creating a font suitable for displaying this script. Note that for new scripts where there is no charset defined, bCharSet may be innapropriate and DEFAULT_CHARSET should be used instead - see the description of fAmbiguousCharSet below. contains control characters. The Unicode range U+E000 through U+F8FF. A script, such as Thai, where justification is conventionally achieved by increasing the space between all letters, not just between words. A script for which ScriptShape generates an invalid glyph to represent invalid sequences. The glyph index of the invalid glyph for a particular font may be obtained by calling ScriptGetFontProperties. A script for which ScriptBreak sets the fInvalid flag in the logical attributes to mark invalid sequences. Implies that an item analysed by ScriptItemize included combining diacritical marks (U+0300 through U+36F). No single legacy charset supports this script. For example the extended Latin Extended-A Unicode range includes characters from the EASTUROPE_CHARSET, the TURKISH_CHARSET and the BALTIC_CHARSET. It also contains characters that are not available in any legacy charset. Use DEFAULT_CHARSET when creating fonts to display parts of this run. A script, such as Arabic, where contextual shaping may cause a string to increase in size when removing characters. A script, such as Thai, where invalid sequences conventionally cause an editor such as notepad to beep, and ignore keypresses. ScriptGetProperties returns the address of a table that maps a script in a SCRIPT_ANALYSIS uScript field to properties including the primary language associated with that script, whether it's numeric and whether it's complex.
Returns information from the font cache
Note that SSA_HOTKEY and SSA_HIDEHOTKEY remove the hotkey '&' character from further processing, so functions such as ScriptString_pLogAttr return arrays based on a string which excludes the '&'. Defines tabstop positions for ScriptStringAnalyse (ignored unless SSA_TAB passed)
Number of entries in the pTabStops array. If zero, tabstops are every 8 average character widths. If one, all tabstops are the length of the first entry in pTabStops. If more than one, the first cTabStops are as specified in the pTabStops array, subsequent tabstops are every 8 average characters from the last tabstop in the array. Scale factor for iTabOrigin and pTabStops entries. Values are converted to device coordinates by multiplying by iScale then dividing by 4. If values are already in device units, set iScale to 4. If values are in dialog units, set iScale to the average char width of the dialog font. If values are multiples of the average character width for the selected font, set iScale to 0. Array of cTabStops entries. Each entry specifies a tabstop position. Positive values give nearedge alignment, negative values give faredge alignment. Tabs are considered to start iTabOrigin before the beginning of the string. Helps with multiple tabbed outputs on the same line. cString - Input string must contain at least one character hdc - required if SSA_GLYPH requested. Optional for SSA_BREAK. If present the current font in the hdc is inspected and if a symbolic font the character string is treated as a single neutral SCRIPT_UNDEFINED item. Note that the uBidiLevel field in the initial SCRIPT_STATE value is ignored - the uBidiLevel used is derived from the SSA_RTL flag in combination with the layout of the hdc.
returns a pointer to the size (width and height) of an analysed string Note that the SIZE pointer remains valid only until the SCRIPT_STRING_ANALYSIS is passed to ScriptStringFree.
returns pointer to length of string after clipping (requires SSA_CLIP set) Note that the int pointer remains valid only until the SCRIPT_STRING_ANALYSIS is passed to ScriptStringFree.
returns pointer to logical attributes buffer in a SCRIPT_STRING_ANALYSIS Note that the buffer pointer remains valid only until the SCRIPT_STRING_ANALYSIS is passed to ScriptStringFree. The logical attribute array contains *ScriptString_pcOutChars(ssa) entries.
Creates an array mapping original character position to glyph position. Treats clusters as they were in legacy systems - Unless a cluster contains more glyphs than codepoints, each glyph is referenced at least once from the puOrder array. Requires SSA_GLYPHS requested in original ScriptStringAnalyse call. The puOrder parameter should address a buffer containing room for at least *ScriptString_pcOutChars(ssa) ints.
Return x coordinate for leading or trailing edge of character icp.
Converts visual withs in psa->piAdvance into logical widths, one per original character, in logical order. Requires SSA_GLYPHS requested in original ScriptStringAnalyse call. The piDx parameter should address a buffer containing room for at least *ScriptString_pcOutChars(ssa) ints.
Scans the string analysis for invalid glyphs. Only glyphs generated by scripts that can generate invalid glyphs are scanned. returns S_OK - no invalid glyphs are present S_FALSE - one or more invalid glyphs are present
Displays the string generated by a prior ScriptStringAnalyze call, then optionally adds highlighting corresponding to a logical selection. Requires SSA_GLYPHS requested in original ScriptStringAnalyse call.
uOptions may nclude only ETO_CLIPPED or ETO_OPAQUE. Determines whether a Unicode string requires complex script processing The dwFlags parameter may include the following requests
SIC_COMPLEX: Should normally set. Causes complex script letters to be treated as complex. SIC_ASCIIDIGIT: Set this flag if the string would be displayed with digit substitution enabled. If you are following the users NLS settings using the ScriptRecordDigitSubstitution API, you can pass scriptDigitSubstitute.DigitSubstitute != SCRIPT_DIGITSUBSTITUTE_NONE. SIC_NEUTRAL: Set this flag if you may be displaying the string with right-to-left reading order. When this flag is set, neutral characters are considered as complex. Returns S_OK if string requires complex script processing, S_FALSE if string contains only characters laid out side by side from left to right.
Reads NLS native digit and digit substitution settings and records them in the SCRIPT_DIGITSUBSTITUTE structure.
Standard digits for the selected locale as defined by the countries standard setting authority. Digits originally used with the locales script. Selects between None, Context, National and Traditional. See ScriptApplyDigitSubstitution below for constant definitions. Although most complex scripts have their own associated digits, many countries using those scripts use western (so called 'Arabic') digits as their standard. NationalDigitLanguage reflects the digits used as standard, and is set from the NLS data for the locale. On Windows 2000 the national digit langauge can be adjusted to any digit script with the control panel/regional options/numbers/Standard digits listbox. The TraditionalDigitLanguage for a locale is derived directly from the script used by that locale.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||