|
|
Increasing Multimedia Arabic Content By: Moataz El Saban
|
|
Information Age
|
|
The information age we are living in is characterized by the ease of access, finding
and preserving of content. In order to be successful, as an individual, organization
or establishment the need to be able to access relevant content in a timely fashion
is crucial. To illustrate this, consider a student seeking to learn more about a
topic that was introduced in a lecture. Through the Internet, the student is able
to find, from the comfort of his/her home and without needing to visit a library,
a wealth of additional information on the subject crucial to expanding his/her horizon
on the topic.
|
|
|
On the other end of the spectrum, a government shaping and executing long as well
short term plans would need efficient and effective access to content. Access to
content, written, spoken or viewed, is greatly eased if this content is in a digital
form. The developed world
|
|
realized the value of digital content early on and made great strides towards creating
digital content and transforming existing content into a digital form, which has
led to the birth of the digital age
Today, almost 83% of the internet users watch videos online, and 64% visit photo
sharing websites. The growth and use of this type of content and multimedia (MM)
content in general is continuously rising with ever growing market growth. MM applications
span many domains such e-learning, medical and map services to name a few.
|
|
|
|
This is supported with media capturing devices becoming more and more ubiquitous,
in forms of digital cameras, camcorders and mobile phones. Another important factor
that is fueling this growth is the increased availability of high-speed internet
connections.
The role of MM content is more significant in markets, such as the Arab region,
where information consumption patterns are skewed towards spoken and visual content
as compared to written and textual material. Hence, increasing the relevant MM content,
especially Arabic content, is of great importance. Supporting such growth requires
the developing technologies and mechanisms to ease the creation, accessibility,
and preservation of content. To achieve this goal, more digitally-born (content
generated in digital form) should be supported, in addition to preservation of already
existing content in its digital form.
Digitally born content
Creating, sharing, servicing, and preserving MM content in digital formats in the
Arabic region face a number of hurdles that need to be addressed. First, the PC
penetration and level of Internet literacy, although growing, continues to be inappropriately
low. Furthermore, there is a shortage in the number of organizations/institutions
creating digital MM content. Collaborative efforts from the more digitally aware
and tech savvy individuals and organizations in the region in producing MM content
would play a significant role. This could be modeled around existing collaborative
environments, such as Wikipedia and YouTube, to support the growth of digital Arabic
MM content.
Another major obstacle for the growth of digital MM content is the lack of tools
and services that would enable users to access and find information with relative
ease. This factor presents itself as both an obstacle and an opportunity. The creation
of useful services will accelerate the creation of relevant content and vice-versa
|
|
CMIC is currently engaged in research efforts aiming at easing the indexing, browsing,
and searching of MM content. Last, but not least, internet access and bandwidth
constraints limit the publishing and consumption of digital MM content. This is
especially true when using portable publishing devices, such as cell phones, which
are widely spread and abundant in the Arab region. Such platforms suffer from limited
connectivity resources. Research opportunities under this limited resources environment
include devising efficient and effective methods for decreasing bandwidth utilization
through efficient MM compression, summarization, and repurposing.In CMIC, we are
currently pursuing a number of ideas related to MM summarization in domains such
as education and personal videos.
|
|
|
The creation of these services and tools can significantly increase online digital
MM content, particularly in the Arab region.
Preservation of existing, cultural and heritage content
The Arab region possesses a wealthy and rich cultural and heritage history that
has been documented in different forms for many years. Some of this content is already
in MM format, yet not in digital form and has not received the same attention written,
textual information has. The preservation and ease of access of such content is
of utmost value. This spans the whole spectrum from old manuscripts, images and
maps to contemporary speeches and analog recorded video content. Besides the challenges
outlined above for generating digital content, there are a number of additional
ones that face digitization of Arabic MM content. If the content is in a paper document
format, processes like OCR (optical character recognition) are needed for the digitization.
Compared to the performance levels attained for English OCR, Arabic OCR systems
have lagged heavily on the level of achieved performance. This is due to many factors
such as difficult Arabic orthography, calligraphy variations, degraded document
papers conditions, and above all the limited research that has been devoted to improve
the quality of Arabic OCR systems
|
|
This squarely applies to the process of digitizing speeches and audio material.
Automated speech recognition (ASR) system is typically needed to convert the spoken
content into written content that can be easily browsed and searched. Many challenges,
and subsequently research opportunities, exist in the ASR and information retrieval
(IR) processes such as dialect variations and noisy conditions especially for old
recordings. In CMIC, we are currently pursuing efforts for the ease of access of
Arabic paper documents using novel fully automated IR methods. Besides, we believe
that user collaborative efforts for tagging and correction of automated digitization
results can be very effective in solving issues related to access of non-digital
Arabic MM content.
|
|
|
Permission to make digital or hard copies of all or part of this work for personal
or classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full
citation on the first page. To copy otherwise, or republish, to post on servers
or to redistribute to lists, requires prior specific permission and/or a fee.
|
|
|
|
|