HOW TO: Formatting Control Characters
Content
Introduction
Arabic script is complex because of its
bi-directional (Bidi) layout requirements; words are written right-to-left
meanwhile numerals and Latin words are displayed left-to-right. Microsoft uses
the Unicode Bidirectional algorithm to resolve the proper directionality of
text and this algorithm consists of an implicit part based on character
properties, as well as explicit control character for embeddings and
overrides.
This article demonstrates how to use formatting control characters to display
Bidi text properly. In addition to the most-common control character and there
expected behavior.

Problem
In the case of Arabic and any other bidi
text, there are situations (especially when using weak characters) where an
implicit bidirectional ordering is insufficient to produce understandable text. To deal with these cases, a set of
directional formatting control characters are defined to control the ordering
of characters when rendered.
Examples
| Incorrect
Order |
Correct Order |
Comment |
|
|
|
 |
 |
The closing
bracket at the end of the sentence appears as rtl. |
| |
|
|
 |
 |
The full-stop
at the end of the sentence appears at the beginning. |

Control Characters
| Control Character |
Meaning |
Unicode |
Usage
|
|
RLM |
Right-to-Left Mark |
200F |
Acts as an
Arabic character. |
|
LRM |
Left-to-Right Mark |
200E |
Acts as a Latin
character. |
|
RLE |
Right-to-Left Embedding |
202B |
Treat the following text as embedded right-to-left. |
|
LRE |
Left-to-right embedding |
202A |
Treat the following text as embedded left-to-right. |
|
RLO |
Right-to-left override |
202E |
Force following characters to be treated as strong
right-to-left characters. |
|
LRO |
Left-to-right override |
202D |
Force following characters to be treated as strong
left-to-right characters. |
|
PDF |
Pop Directional Format
|
202C |
Restore the bidirectional state to what it was before the
last LRE, RLE, RLO, LRO. |
|
ZWJ |
Zero width joiner |
200D |
Forces leading
and trailing characters to be in joined form, if possible. |
|
ZWNJ |
Zero width non joiner |
200C |
Forces leading
and trailing characters not to be joined. |
Note
The formatting control characters are used only to influence the
display of text. They should be otherwise ignored, this means, they
should not be stored and therefore should not
effect text comparison, parsing, or numeric analysis.

Samples
The following are samples for the formatting
control characters. For more information and a deeper understanding check out
Notepad.exe. If you right click in the editable area, you can insert and view
a long list of control characters. Check them out!


RLM
Scenario: Adham is creating an English
document which contains some Arabic words but he had a problem with the
bracket display. Examine his display:

To fix this problem he added a RLM
control character after the closing bracket. The RLM character acts as another
Arabic character and so the bracket ( which is a weak character) changes its
direction and moves to its correct place. Examine the correct display after
adding the RLM.

Note: You would not needed to add
the RLM if the document direction was right-to-left.

LRM
Scenario: Samer is creating an Arabic
document which contains some English words but he had a problem with the dot
before the English text, it is now displayed at the end of the text instead of
the beginning. Examine his display:

To fix this problem he added a LRM
control character before the dot. The LRM character acts as another Latin
character and so the dot (which is a weak character) changes its place.
Examine the correct display after adding the LRM.

Note: You would not needed to add
the LRM, before the dot, if the document direction was left-to-right.

RLE
Scenario: Adham is creating an English
document which contains a record. This record contains an Arabic word but the
record order is not correct. Examine his display:

To fix this problem he added a RLE
control character at the beginning of the text. The RLE character
changes the reading order to right-to-left. Examine the correct display after adding the
RLE.

Note: You would not needed to add
the RLE if the document direction was right-to-left.

LRE
Scenario: Adham is preparing an Arabic
document which contains a file path but he had a problem with the display.
Examine his display:

To fix this problem he added a LRE
control character at the beginning of the folder name. The LRE character
changes the reading order to left-to-right.
Examine the correct display after adding the LRE.

Note: You would not needed to add
the LRE if the document direction was left-to-right.

RLO
Scenario: Adham is creating a document
and would like to force even Arabic blocks to be displayed left-to-right. He
added a RLO character at the beginning of the text.


LRO
Scenario: Sanaa is creating a document
and would like to force even Latin blocks to be displayed right-to-left. She
added a LRO character at the beginning of the text. Examine her display:


ZWJ
Scenario: Sanaa is creating a document
to show the different character shapes in Arabic. She wants to display all the
different shapes of "3enn". Examine her display:


ZWNJ
Scenario: Ahmed is creating a report which contains some Arabic
part numbers. In this case he need to show Arabic characters next to each
other but they should not be joined. Examine his display:

To fix this problem he added a ZWNJ
control character between each Arabic character. The ZWNJ character
forces the leading and trailing characters not to be in joined form. Examine
the correct display after adding ZWNJ.


REFERENCES
For more information on using formatting
control characters, see the following link:

|