The Georgia Archives sought a way to make audio and video recordings easily accessible and searchable on the web. The Archives chose a solution based on the Microsoft Research Audio Video Indexing System (MAVIS) and Windows Azure. It now enjoys
improved productivity and faster access to audio and video content for citizens, legislators, and other interested parties, and less work for the Georgia Archives—all with minimal costs and no IT issues.
The Georgia Archives collect the state’s permanent records and makes them accessible to all. It has well-established processes for providing access to paper records, which are relatively easy to digitize and search electronically. But it faced greater
challenges for audio and video recordings, for which it lacked the means to make broadly and efficiently accessible.
David Carmicheal, Director of the Georgia Archives, cites recordings of legislative sessions as a good example. “These recordings are typically hours in length, and the only information available about them is the recording date, which means that people
can’t readily find content within a recording unless they listen to it—that is, after requesting a copy and waiting about a week for a copy to be made,” says Carmicheal. “Requests are also labor-intensive for Archives staff, requiring anywhere from two to
eight hours to make a physical copy of an older, analog recording.”
The Archives briefly tried using speech recognition software to convert recorded audio to searchable text. However, the results were deemed too inaccurate, especially when working with strong accents and poor-quality recordings.
||The Georgia Archives exists to help serve the state’s residents, legislators, and government officials, and we now have a new tool that enables anyone to watch government at work and explore areas of interest.
Director, The Georgia Archives
A conversation Carmicheal had with Microsoft led the Georgia Archives to the Microsoft Research Audio Video Indexing System (MAVIS). Here’s how MAVIS works and why it can be more accurate (and ultimately more useful) than other solutions that also use
large-vocabulary continuous speech recognition (LVCSR) to convert audio to text, so that it can be searched:
Typical LVCSR systems have a preconfigured vocabulary, which makes them susceptible to inaccuracies due to factors such as accents and out-of-vocabulary terms, as may be the case with proper names. MAVIS helps overcome these challenges by using the Bing
search engine to get more information about the content, which it then uses to expand its base vocabulary. MAVIS also preserves the confidence with which a word is recognized and which other potential matches were considered—a technique pioneered by Microsoft
Research called Probabilistic Word-Lattice Indexing—and preserves time stamps to support direct navigation to keyword matches.
The Georgia Archives initially tested MAVIS with 100 hours of recordings. Because of the compute-intensive nature of MAVIS processing, these initial tests were performed on servers managed by Microsoft Research. By the time Carmicheal was ready to test another
500 hours of content, Microsoft Research had MAVIS running on Windows Azure, as a way to make it easy to adopt without having to invest in server infrastructure and easier to scale based on workload. “I was impressed by the accuracy of MAVIS, and equally impressed
by how quickly and inexpensively we could put it to work for us on Windows Azure,” says Carmicheal.
In May 2011, the Georgia Archives launched a site that enables users to search four years of recordings from the Georgia General Assembly. Legislators can use it to research why a bill did or did not pass, and citizens can use it to gain insight into the
arguments for or against a bill—including the ability to hear the emotional charge of discussions on a topic.
Microsoft has since enlisted solution provider GreenButton—named Windows Azure Partner of the Year in 2011—to help early adopters such as the Georgia Archives to continue to use MAVIS and to make it commercially available to other organizations. Carmicheal
is evaluating a proposal from GreenButton for a turn-key approach, in which recordings of all legislative sessions will be uploaded to a website hosted on Windows Azure. Indexed recordings will be live and searchable within 24 hours, so that anyone can hear
for themselves exactly what Georgia legislators are saying. “MAVIS works great and the price is very reason-able,” says Carmicheal. “Were we to have audio and video recordings transcribed, it would cost at least ten times as much.”
By using Microsoft technology, the Georgia Archives has made its wealth of audio and video recordings easily accessible to all. Specific benefits include:
Improved productivity. People no longer need to wait up to a week for the Georgia Archives to duplicate a recording, nor do they need to listen to the entire recording to determine if it contains what they need. Instead, a single search
shows hits across all recordings, including text snippets that show the search terms in context. Clicking on a snippet immediately takes the user directly to that portion of the recording.
Faster access to content. Under the proposal from GreenButton, uploaded content will be live and searchable within 24 hours. GreenButton can make this commitment because of the immediate scalability provided by Windows Azure, which makes
it possible to immediately devote as many servers as needed to the compute-intensive MAVIS processing algorithms.
Less work. Making audio and video recordings accessible and searchable online reduces the workload for the Georgia Archives, which will no longer need to spend as many as eight hours servicing a request for a copy of a recording.
Minimal costs and no IT issues. The Georgia Archives did not need to acquire servers, nor does it have to worry about system administration or backups. Similarly, as more content is added, the Archives will not need to worry about scalability
or additional disk space.
“We have some really good information in our audio and video archives, but until now, it was too difficult for people to find it,” concludes Carmicheal. “The Georgia Archives exists to help serve the state’s residents, legislators, and government officials,
and we now have a new tool that enables anyone to watch government at work and explore areas of interest.”
The solution discussed in this case study can be found at:
More information about adopting MAVIS can be found at:
More information about GreenButton can be found at:
More information about Windows Azure can be found at:
For more information about other Microsoft customer successes, please visit: