*
Microsoft*
Results by Bing
Microsoft Innovation Lab in Cairo (CMIC) 
Increasing Multimedia Arabic Content
By: Moataz El Saban

Information Age

The information age we are living in is characterized by the ease of access, finding and preserving of content. In order to be successful, as an individual, organization or establishment the need to be able to access relevant content in a timely fashion is crucial. To illustrate this, consider a student seeking to learn more about a topic that was introduced in a lecture. Through the Internet, the student is able to find, from the comfort of his/her home and without needing to visit a library, a wealth of additional information on the subject crucial to expanding his/her horizon on the topic.

On the other end of the spectrum, a government shaping and executing long as well short term plans would need efficient and effective access to content. Access to content, written, spoken or viewed, is greatly eased if this content is in a digital form. The developed world

realized the value of digital content early on and made great strides towards creating digital content and transforming existing content into a digital form, which has led to the birth of the digital age

Today, almost 83% of the internet users watch videos online, and 64% visit photo sharing websites. The growth and use of this type of content and multimedia (MM) content in general is continuously rising with ever growing market growth. MM applications span many domains such e-learning, medical and map services to name a few.

This is supported with media capturing devices becoming more and more ubiquitous, in forms of digital cameras, camcorders and mobile phones. Another important factor that is fueling this growth is the increased availability of high-speed internet connections.

The role of MM content is more significant in markets, such as the Arab region, where information consumption patterns are skewed towards spoken and visual content as compared to written and textual material. Hence, increasing the relevant MM content, especially Arabic content, is of great importance. Supporting such growth requires the developing technologies and mechanisms to ease the creation, accessibility, and preservation of content. To achieve this goal, more digitally-born (content generated in digital form) should be supported, in addition to preservation of already existing content in its digital form.

Digitally born content

Creating, sharing, servicing, and preserving MM content in digital formats in the Arabic region face a number of hurdles that need to be addressed. First, the PC penetration and level of Internet literacy, although growing, continues to be inappropriately low. Furthermore, there is a shortage in the number of organizations/institutions creating digital MM content. Collaborative efforts from the more digitally aware and tech savvy individuals and organizations in the region in producing MM content would play a significant role. This could be modeled around existing collaborative environments, such as Wikipedia and YouTube, to support the growth of digital Arabic MM content.

Another major obstacle for the growth of digital MM content is the lack of tools and services that would enable users to access and find information with relative ease. This factor presents itself as both an obstacle and an opportunity. The creation of useful services will accelerate the creation of relevant content and vice-versa

CMIC is currently engaged in research efforts aiming at easing the indexing, browsing, and searching of MM content. Last, but not least, internet access and bandwidth constraints limit the publishing and consumption of digital MM content. This is especially true when using portable publishing devices, such as cell phones, which are widely spread and abundant in the Arab region. Such platforms suffer from limited connectivity resources. Research opportunities under this limited resources environment include devising efficient and effective methods for decreasing bandwidth utilization through efficient MM compression, summarization, and repurposing.In CMIC, we are currently pursuing a number of ideas related to MM summarization in domains such as education and personal videos.

The creation of these services and tools can significantly increase online digital MM content, particularly in the Arab region.

Preservation of existing, cultural and heritage content

The Arab region possesses a wealthy and rich cultural and heritage history that has been documented in different forms for many years. Some of this content is already in MM format, yet not in digital form and has not received the same attention written, textual information has. The preservation and ease of access of such content is of utmost value. This spans the whole spectrum from old manuscripts, images and maps to contemporary speeches and analog recorded video content. Besides the challenges outlined above for generating digital content, there are a number of additional ones that face digitization of Arabic MM content. If the content is in a paper document format, processes like OCR (optical character recognition) are needed for the digitization. Compared to the performance levels attained for English OCR, Arabic OCR systems have lagged heavily on the level of achieved performance. This is due to many factors such as difficult Arabic orthography, calligraphy variations, degraded document papers conditions, and above all the limited research that has been devoted to improve the quality of Arabic OCR systems

This squarely applies to the process of digitizing speeches and audio material. Automated speech recognition (ASR) system is typically needed to convert the spoken content into written content that can be easily browsed and searched. Many challenges, and subsequently research opportunities, exist in the ASR and information retrieval (IR) processes such as dialect variations and noisy conditions especially for old recordings. In CMIC, we are currently pursuing efforts for the ease of access of Arabic paper documents using novel fully automated IR methods. Besides, we believe that user collaborative efforts for tagging and correction of automated digitization results can be very effective in solving issues related to access of non-digital Arabic MM content.


Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.


©2009 Microsoft Corporation. All rights reserved. Contact Us |Terms of Use |Trademarks |Privacy Statement