 |

Arabic Proofing Tools in Office 2003
White Paper
Abstract
This paper presents information about the new features and enhancements in Arabic proofing tools with Microsoft Office 2003.
Index
- Introduction
- List of Arabic proofing tools
- Common Features
- Speller Features
4.1 Custom Dictionaries
4.2 Suggestion
- Grammar Features
- Arabic Thesaurus
- Translation
- Conclusion
Introduction
The proofing tools have been improved in Office 2003, this improvement covered most of the languages, but Arabic has the major part of it. Proofing tools include many new tools for Arabic, plus improvement to the existing tools.

List of Arabic proofing tools
- Spell Checking/Correction
- Grammar Checking/Correction
- Arabic Thesaurus
- Arabic-English Bi-Directional Dictionary
- Arabic-French Bi-Directional Dictionary
These tools have common features, and some of the tools have specific features too. In the following sections we will talk in more details about these features

Common Features
All the proofing tools in Microsoft Office 2003 are based on same Arabic lexicon. Using same lexicon means Consistency and Integrity among all the tools. This lexicon is stored using advanced technique. This technique uses AI (Artificial Intelligence) data structures to store more than 15,500,000 words in less than 1.3 MB file. The technique used in lexicon storage is a combination of root based, derivation, and affixation. Combining all this techniques, in one lexicon, grantees full coverage of the Arabic language in a single lexicon. The lexicon covers mainly the modern language, but it also includes all Qura'an words. The lexicon is collected from almost all traditional lexicons (Lissan Al-Arab, Al-Qamous Al-Moheet, Mukhtar Al-Sahah, etc.) plus a huge corpus of modern language. The main lexicon has another advantage, that it includes a huge amount of the most common proper nouns.
The algorithm, which is used in the lexicon processing, is one of the most advanced algorithms that is used in NLP (Natural Language Processing). That technique gives Arabic lexicon uniqueness in its performance among other languages. The tests on a PIII 1.7 machine show that Arabic speller can check more than 3,000,000 words per minute

Speller Features
The speller checker and corrector inherit all the features of lexicon storage and processing. In addition, it has two additional features that improve the performance of the Arabic Spell checker and corrector shipped with Microsoft Office 2003. These features are:
- Custom Dictionaries
- Suggestion
Custom Dictionaries
Custom dictionaries have two types in MS Office 2003. They are either language independent dictionaries, or language dependent one. In Office 2003, the user can add, delete or edit a dictionary. The user also can activate or deactivate any dictionary at any time. By default, when a new custom dictionary is created, it would be a language independent dictionary, meaning that the dictionary is used when you check the spelling of text in any language. However, one can associate a custom dictionary with a particular language so that Word only uses the dictionary when you check spelling of text in a particular language. The editing of the custom dictionary is integrated within Office, and does not need any external editor. Each Arabic custom dictionary can store up to 2000 word. The user can use any number of Arabic custom dictionaries in the same time.
To add new custom dictionary to Microsoft Word:
- From the Tools menu, select Options
- Click on the Spelling and Grammar tab
- Click on Custom Dictionaries button
- From the dialog box, click on New button to create a new custom dictionary
To associate a customer dictionary with particular language
- From the Tools menu, select Options
- Click on the Spelling and Grammar tab
- Click on Custom Dictionaries button
- From the dialog box, select the custom dictionary you want and click on Modify button
Suggestion
The most important improved feature in Arabic speller of Office 2003 is the suggestion mechanism. The suggestion in this version is based on new AI (Artifcial Intelligence) techniques. When the user types a misspelled Arabic word, the speller suggests always the desired word for each user. The Speller suggestion includes keyboard mistyping correction, HAMAZA correction, missed dots correction, soundex correction, pronouns corrections, morphological corrections, grammatical corrections, and shape likes corrections. The technique uses a user specific memory to remember user common mistakes; also, it remembers the user specific corrections. The suggestions are ranked using many factors to grantee that the user finds the desired word in the first five suggested words.

Grammar Features
Arabic Grammar Checker that shipped with Microsoft Office 2003 is the sole Arabic Grammar checker in the market. The version shipped in Office 2003 based on the core of the version shipped in Office XP. This version has many enhances includes faster response, more grammar rules, punctuation correction, and more accurate suggestions. The Grammar checker supports checking and correction for Arabic simple sentences. The Grammar checker is integrated with spell checker, and both of them based on the same lexicon to insure consistency. The theory behind this grammar checker is entirely new in the languages theories. This theory combines the innovation of classic Arabic grammar rules, with the novelty of modern languages theories. This combination leads to a very fast grammar checker and corrector, which verifies all the grammar rules of Arabic language. The grammar checker also have a unique feature allows it to correct the errors iteratively. This feature allows the grammar checker to correct multiple grammar errors in the same sentence. The punctuation correction in Office 2003 grammar checker is totally a new feature for Arabic language. This feature takes care of spaces, commas and question marks. The grammar checker detects the type of the sentence and recommends the best punctuation for sentence ending.

Arabic Thesaurus
The Arabic Thesaurus in Microsoft Office 2003 Based on the first Arabic Thesaurus appeared in the marketing MS Office XP. This Thesaurus is a result of a long period of data preparation of almost Arabic classical lexicons, in addition to a few modern books that related to synonyms. Arabic Thesaurus in MS Office 2003 is the first product joins the word with its synonyms, antonyms, and related words. Arabic Thesaurus not only finds the synonym, antonym and related word, but it finds the synonym of all the derivatives of the word also, i.e. if you want to find the synonym of the word بسط, you can use the words البسط, بسطه، بسطكم, etc… The Arabic Thesaurus has also a unique feature; this feature is formatting the synonym in the same form as the query word. This feature is related to Arabic language only, and not implemented in any other language.

Translation
Microsoft Office 2003 has four dictionaries:
- English to Arabic
- Arabic to English
- French to Arabic
- Arabic to French
These dictionaries are collected from many standard dictionaries, and have been edited to be suitable for all Arab countries. The Arabic to English and Arabic to French dictionaries have an Arabic related feature. This feature is to analyze the input word, get all the possible meanings of this word, and generate the proper translation for this word. The analysis resolves all levels of ambiguity of the word. These dictionaries are the initiate of the Microsoft ambiguous project of full machine translation system.

Conclusion
The proofing tools in Office 2003 has been enhanced in many areas to include better language dependant features as well as intelligent technique in spelling suggestions and corrections.
For more information: Office 2003 Web Site: /middleeast/office/

Disclaimer
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
The example companies, organizations, products, people and events depicted herein are fictitious. No association with any real company, organization, product, person or event is intended or should be inferred.
© 2002 Microsoft Corporation. All rights reserved.
Microsoft, Windows Media Encoder 9, Windows Media Stream Editor, Windows Media File Editor, Windows Media Player Series 9, Windows XP, is either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.
Other product or company names mentioned herein may be the trademarks of their respective owners.
Microsoft Corporation • One Microsoft Way • Redmond, WA 98052-6399 • USA

|