Click Here to Install Silverlight*
Middle EastChange|All Microsoft Sites
Microsoft

HOW TO: Formatting Control Characters

Content

Introduction

Arabic script is complex because of its bi-directional (Bidi) layout requirements; words are written right-to-left meanwhile numerals and Latin words are displayed left-to-right. Microsoft uses the Unicode Bidirectional algorithm to resolve the proper directionality of text and this algorithm consists of an implicit part based on character properties, as well as explicit control character for embeddings and overrides.
This article demonstrates how to use formatting control characters to display Bidi text properly. In addition to the most-common control character and there expected behavior.

 


Problem

In the case of Arabic and any other bidi text, there are situations (especially when using weak characters) where an implicit bidirectional ordering is insufficient to produce understandable text. To deal with these cases, a set of directional formatting control characters are defined to control the ordering of characters when rendered.

Examples

Incorrect Order Correct Order Comment
The closing bracket at the end of the sentence appears as rtl.
     
The full-stop at the end of the sentence appears at the beginning.



Control Characters

Control Character Meaning Unicode Usage
RLM Right-to-Left Mark 200F Acts as an Arabic character.
LRM Left-to-Right Mark 200E Acts as a Latin character.
RLE Right-to-Left Embedding 202B Treat the following text as embedded right-to-left.
LRE Left-to-right embedding 202A Treat the following text as embedded left-to-right.
RLO Right-to-left override 202E Force following characters to be treated as strong right-to-left characters.
LRO Left-to-right override 202D Force following characters to be treated as strong left-to-right characters.
PDF
Pop Directional Format
 
202C Restore the bidirectional state to what it was before the last LRE, RLE, RLO, LRO.
ZWJ Zero width joiner 200D Forces leading and trailing characters to be in joined form, if possible.
ZWNJ Zero width non joiner 200C Forces leading and trailing characters not to be joined.

Note The formatting control characters are used only to influence the display of text. They should be otherwise ignored, this means, they should not be stored and therefore should not effect text comparison, parsing, or numeric analysis.



Samples

The following are samples for the formatting control characters. For more information and a deeper understanding check out Notepad.exe. If you right click in the editable area, you can insert and view a long list of control characters. Check them out!

 


RLM

Scenario: Adham is creating an English document which contains some Arabic words but he had a problem with the bracket display. Examine his display:

To fix this problem he added a RLM control character after the closing bracket. The RLM character acts as another Arabic character and so the bracket ( which is a weak character) changes its direction and moves to its correct place. Examine the correct display after adding the RLM.

Note: You would not needed to add the RLM if the document direction was right-to-left.



LRM

Scenario: Samer is creating an Arabic document which contains some English words but he had a problem with the dot before the English text, it is now displayed at the end of the text instead of the beginning. Examine his display:

To fix this problem he added a LRM control character before the dot. The LRM character acts as another Latin character and so the dot (which is a weak character) changes its place. Examine the correct display after adding the LRM.

Note: You would not needed to add the LRM, before the dot, if the document direction was left-to-right.



RLE        

Scenario: Adham is creating an English document which contains a record. This record contains an Arabic word but the record order is not correct. Examine his display:

To fix this problem he added a RLE control character at the beginning of the text. The RLE character changes the reading order to right-to-left. Examine the correct display after adding the RLE.

Note: You would not needed to add the RLE if the document direction was right-to-left.



LRE           

Scenario: Adham is preparing an Arabic document which contains a file path but he had a problem with the display. Examine his display:

To fix this problem he added a LRE control character at the beginning of the folder name. The LRE character changes the reading order to left-to-right. Examine the correct display after adding the LRE.

Note: You would not needed to add the LRE if the document direction was left-to-right.



RLO          

Scenario: Adham is creating a document and would like to force even Arabic blocks to be displayed left-to-right. He added a RLO character at the beginning of the text.



LRO       

Scenario: Sanaa is creating a document and would like to force even Latin blocks to be displayed right-to-left. She added a LRO character at the beginning of the text. Examine her display:



ZWJ

Scenario: Sanaa is creating a document to show the different character shapes in Arabic. She wants to display all the different shapes of "3enn". Examine her display:



ZWNJ

Scenario: Ahmed is creating a report which contains some Arabic part numbers. In this case he need to show Arabic characters next to each other but they should not be joined. Examine his display:

To fix this problem he added a ZWNJ control character between each Arabic character. The ZWNJ character forces the leading and trailing characters not to be in joined form. Examine the correct display after adding ZWNJ.




REFERENCES

For more information on using formatting control characters, see the following link:



©2014 Microsoft Corporation. All rights reserved. Contact Us |Terms of Use |Trademarks |Privacy Statement
Microsoft