White Paper - (Microsoft Windows Vista Arabic Search)
|
Table of contents:
|
|
|
|
| Executive Summary:
|
We live in an age where nearly everything is digital. Documents, music, video,
photos, and even daily correspondence (including e-mail, faxes, and voice mail)
are increasingly created, stored, and accessed in electronic form on personal
computers. This fact and the huge increase in hard disk storage capacity have
made it increasingly difficult to stay on top of the information stored on our
PCs. The enhanced desktop search and organization features in all Windows Vista
editions help you readily locate files, e-mail messages, and other items on
your PC. If you remember anything about this file Windows Vista can find it for
you quickly. Windows Vista goes beyond desktop search it can also help you
"see" your files in multiple ways. Want to see all of your documents arranged
by date? How about by author? No problem. The system can auto-organize your
content using basic properties that are often automatically saved with your
files. Even better, you can tag your files with relevant properties, enabling
the system to bring together your documents, photos, music, and videos in
whatever way you think about them.
|
|
| Overview & Scope |
The Arabic Vista search functionality is enhanced by an Arabic
language-specific word breaker, stemmer and Named Entities (NEs) detection tool
to provide increased relevance of search results. Word breakers are an
essential part of any search engine, since they define the elements of a search
query which will be matched against the document index. Many search engines use
the simple language-neutral technique of breaking on white space, which is
insufficient. The new Arabic word breaker in Vista, which benefit from
linguistic and statistical information, will significantly enhance the user's
search experience in a variety of structurally different languages.
|
|
| Goals |
The Arabic language-specific word breaker has the main goal of extending
language coverage and improving word breaking behavior to improve the search
engine experience and gain an advantage in terms of language coverage. The new
Arabic language-specific word breaker will improve the user experience when
using these languages in a search context. In Vista the search engine starts
showing results on each keystroke. As a result, each character that is typed is
effectively prefix matched (wild-carded) so that it returns any words that
begin with that character or characters. The effect should be that as you type,
the number of matching items is reduced (although depending on how the typed
string is word-broken the reality is that the returned result count could go up
or down).
|
|
| Breaking and Non-Breaking Characters |
The determination of word breaking characters is essential, as it establishes
which characters will be coded as word separators. Breaking characters include
white space characters, punctuation markers, quotation marks, parenthesis,
symbols, and more. Any character that is not explicitly listed as a breaking
character is not a breaking character. One way of categorizing word breaking
characters from a linguistic point of view is to assign them to two main
groups:
-
Special Cases
There are many special cases in word breaking which override
standard word breaking behavior. These special cases typically result from
normally word breaking characters not breaking words in certain contexts or
because a particular language uses punctuation token or a special symbol in a
way which combines with the form of a word and therefore requires special
treatment. Common examples of this include abbreviations and acronyms.
-
Named Entities
Named Entities are sequences of tokens which we want to
recognize as a single token and link to a standardized format. Some of these
sequences may contain normally breaking characters. Using Named Entities
enabled us to identify different representations of the same information as
equivalent, thus extending search coverage. It includes Numbers, Currencies,
Times, Dates, Emails, URLs, File paths, and file names.
|
|
| Additional Features for Search |
This section groups together a number of additional features related to The
Vista Arabic search engine. These features include:
Pass-through Feature by including a query in quotation marks, the word
or words in the search query is matched without change against the index.
Special Word List, the word breaker has some rules to ensure that
phrases with characters that are normally breaking characters (e.g., "#") are
not broken in some frequent lexical contexts (e.g., "C#").
Diacritics the Arabic word-breaker and search engine preserves the
diacritics emitting the form with the diacritic. Diacritics are marks added to
a letter or phoneme to indicate a special phonetic value. Diacritics
distinguish words that are otherwise graphically identical such as "اليُمنُ",
"لُبنانِ". For some languages it is configured to be diacritically sensitive by
default & in other languages is not. For Arabic it must be explicitly configured,
When the index is configured to "treat similar words with diacritics as
different words" ((معاملة الكلمات المتشابهة بعلامات تشكيل على أنها كلمات مختلفة
(i.e. to be ‘diacritically sensitive’), a search for "لبنانِ" will not return
items that contain the word "لُبنانِ" (and vice versa). Conversely, if the
index is configured to be diacritically insensitive (the default in English and
in Arabic builds), then a search for "لبنانِ" will return items that contain
the word "لُبنانِ". This setting is configurable from the advanced options of
Indexing Options in Control Panel and requires a re-build of the index after
changing the setting (Figure 1).
Figure 1: Advanced Indexing Options
|
|
| Vista Search Engine Features |
Windows Vista goes beyond desktop search. It can also help you "see" your files
in multiple ways. Want to see all of your documents arranged by date? How about
by author? No problem. The system can auto-organize your content using basic
properties that are often automatically saved with your files. Even better,
with Windows Photo Gallery and Windows Media Player 11 or with third-party
applications, you can tag your files with relevant properties, enabling the
system to bring together your documents, photos, music, and videos in whatever
way you think about them.
-
Instant Search
Instantly find what you need with Windows Vista which introduces the new
Instant Search, an enhanced desktop search and organization tool that helps you
locate files and e-mail messages on your PC. If you remember anything about a
file (the type of file, when it was created, or even what it contains), Windows
Vista can quickly find it for you. With Instant Search, you are never more than
a few keystrokes away from whatever you're looking for. This feature, which is
available almost anywhere you are in Windows Vista, enables you to search for a
file name, a property, or even text contained within a file, and it returns
pinpointed results. It's fast and easy. Instant Search is also contextual,
optimizing its results based on your current activity whether it's searching
Control Panel applets, looking for music files in Windows Media Player, or
looking over all your files and applications on the Start menu.
-
Start-Menu Search
With its "fast as you can type" search performance, the newly redesigned Start
menu is your portal to virtually anything on your PC. To find a specific file,
application, or Internet Favorite, just open the Start menu (or press the
Windows key on the keyboard) and start typing in the embedded Instant Search
box. As you type, Windows Vista instantly searches file and application names,
metadata, and the full text of all files, and groups your results by category:
Programs; Favorites/Internet History; Files, including documents and media; and
Communications, including e-mail, events, tasks, and contacts.
The screen shots below shows the result of the typing effect on the search
results displayed, (Figure 2) shows the result of one character typed "م",
(Figure 3) shows the results got reduced after second character "مص", (Figure
4) shows the results got reduced after third character "مصر".
Figure 2: Start Menu – Instant Search 1
Figure 3: Start Menu – Instant Search 2
Figure 4: Start Menu – Instant Search 3
The screen shots below shows more features of the newly
redesigned Start menu, moving the mouse cursor to any item in the search result
display more information about this item (Figure 5), right clicking on this
item shows more details and actions about this item (Figure 6).
Figure 5: Start Menu – Instant Search – Item Description
Figure 6: Start Menu – Instant Search – Item properties
-
Windows Vista Explorer Showcases Search
The new Windows Vista Explorer showcases Instant Search in the top-left corner.
It's always with you when you're using the Documents Explorer, Music Explorer,
Pictures Explorer, and the new Search Explorer. As in the Start menu, you only
have to type a few letters before you start seeing the most relevant results.
If the results aren't what you're looking for, you have easy access to tools
that can help you refine your search or search across the Internet using your
favorite search engine (Figure 7). For advanced search options click "بحث
متقدم" (Figure 8)
Figure 7: Windows Vista Explorer Showcases Search
Figure 8: Windows Vista Explorer Showcases Advanced Search
Note: in advanced search mode you can use Hijri or Um EL Qura calendar,
in the date search field.
-
Search Folders
Windows Vista introduces Search Folders, a powerful new tool that makes it easy
to find and organize your files. A Search Folder is simply a search that you
save. Opening a Search Folder runs your saved search, displaying up-to-date
results quickly. For example, you could design a search for all documents that
are authored by John and that contain the word "مشروع" You'd save this search,
titled "المشاريع" as a Search Folder. When you open this Search Folder, the
search runs and you see the results right away. As you add more files to your
computer that contain the word "مشروع" those files will appear in the Search
Folder alongside other matching files, no matter where you physically saved
them on your PC. It's simple and fast. Being able to view content on your
computer sorted into saved Search Folders adds a lot of flexibility to the ways
you can work with your files. In addition, Windows Vista still supports
traditional, location-based folders. Folders are useful because they foster
easy migration from one computer to another, and because your existing programs
would break without them. In Windows Vista, you'll still save content in
folders, but it's easier to use those folders because of tools such as Instant
Search and enhanced column header controls.
-
Organization
Although the new desktop search capabilities in Windows Vista fulfill many
search needs, they are not designed to address every information management
need. For instance, they do not readily help you find collections of similar
files, such as files from the same project or author, and then share those
files with other people, organize them, or move them around on your hard disk.
That's where the powerful Explorers extend the benefits of the new Windows
Vista desktop search capabilities to the next level by combining Instant Search
with the ability to auto-organize content across your PC based on file
properties. Rather than having to remember specific locations or folder names
to find your documents, music, pictures, and e-mail, you can rely on the
ability of Windows Vista to search file properties known as "metadata."
-
Tagging your files
Powerful new search and organization features in Windows Vista make extensive
use of file properties, or metadata, to give you even more dynamic ways to
interact with your information. Many of your files already contain useful
metadata. For example, Microsoft Office automatically records certain document
properties, such as author and date created. And music ripped from CDs often
has properties such as song, album, and artist name. But Windows Vista also
gives you ways to apply custom properties to your files. You can quickly and
easily apply properties to any file or group of files in:
Details Pane: The easiest way to add a property to a file is to select
the file and change it in the Details Pane at the bottom of the Explorer. Many
of the entry fields support AutoComplete, making it even easier to add
properties, for one file or across many files. Selecting multiple files and
adding a property via the Details Pane adds that property to all selected
files.
Properties window: You can still go to the familiar Properties window by
right-clicking a file and selecting Properties. In the Details tab you have
quick access to a file's metadata. One handy feature is the ability to remove
all properties of a file with a single click, which can help you prepare a file
for sharing with others by removing details such as the author's name.
|
|
|
Additional Resources
To learn more about Microsoft Windows Vista Desktop Search Engine, please refer
to the following list of related links for additional resources and
information.
|
|
Disclaimer
|
This white paper will discuss the Arabic support in
Windows Vista including the changes from Windows XP and the new added features
related to the Arabic language.
The information contained in this document
represents the current view of Microsoft Corporation on the issues discussed as
of the date of publication. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of
Microsoft, and Microsoft cannot guarantee the accuracy of any information
presented after the date of publication.
This White Paper is for informational purposes only.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE
INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the
responsibility of the user. Without limiting the rights under copyright, no
part of this document may be reproduced, stored in or introduced into a
retrieval system, or transmitted in any form or by any means (electronic,
mechanical, photocopying, recording, or otherwise), or for any purpose, without
the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications,
trademarks, copyrights, or other intellectual property rights covering subject
matter in this document. Except as expressly provided in any written license
agreement from Microsoft, the furnishing of this document does not give you any
license to these patents, trademarks, copyrights, or other intellectual
property.
Unless otherwise noted, the example companies,
organizations, products, domain names, e-mail addresses, logos, people, places,
and events depicted herein are fictitious, and no association with any real
company, organization, product, domain name, email address, logo, person,
place, or event is intended or should be inferred.
© 2007 Microsoft Corporation. All rights reserved.
Windows XP and Windows Vista are either registered
trademarks or trademarks of Microsoft Corporation in the United States and/or
other countries.
The names of actual companies and products
mentioned herein may be the trademarks of their respective owners.
|
|
Last updated: Tuesday , May 31, 2007
|