Few organizations do work as vital and important for humanity as USC Shoah Foundation – The Institute for Visual History and Education (USC Shoah Foundation). The Institute was established in 1994 by filmmaker Steven Spielberg, after his experiences directing his acclaimed film Schindler’s List. As Holocaust survivors visited Spielberg on set, he recognized the necessity and urgency of recording as many firsthand survivor accounts as possible. The organization quickly set up offices around the world and, over several years, conducted as many as 12,000 interviews a year, amassing a total of 52,000 testimonies for its original collection, which has since expanded to more than 55,000.
“With Azure Blob Storage, we’ve added another diverse way of preserving our irreplaceable content. We especially appreciate that Azure is geographically distributed, which helps protect our data from regional incidents.”
Sam Gustman, Chief Technology Officer, USC Shoah Foundation – The Institute for Visual History and Education
Honoring survivors
These testimonies are preserved and stored within USC Shoah Foundation’s Visual History Archive (VHA), an invaluable resource for educators, researchers, and scholars, with nearly every testimony encompassing a complete personal history of life before, during, and after the subject’s firsthand experience with genocide. Today, the VHA is available via subscription through 185 access sites in 23 countries.
In order to establish a permanent home for the VHA, the Shoah Foundation became a part of the University of Southern California (USC) in 2006. Operating as its own institute at USC, it continues to expand the archive to include stories from survivors of other genocides—such as the Armenian Genocide, the Cambodian Genocide, and the 1994 Genocide Against the Tutsi in Rwanda. Today, the Institute preserves testimonies from a dozen episodes of mass violence and is working with partners to conduct interviews in Ukraine as well. However, it faces the challenge of ensuring that its significant digital archive gets preserved in perpetuity, for generations to come. USC Shoah Foundation is doing exactly that, with help from Microsoft Azure Blob Storage.
Preserving the past in a changing digital world
The preservation of moving image content is particularly challenging for USC Shoah Foundation. Traditional print has an established category for “super texts,” works like Shakespeare’s First Folio, the Bible, or the Koran, whose widely accepted significance guarantees survival. However, no such equivalent exists for the relatively recent medium of digital media. Add to that the inherent instability of digital video files, and USC Shoah Foundation has its work cut out for it.
Sam Gustman, USC Shoah Foundation’s Chief Technology Officer, states, “Conservatively, film lasts 50 years before age-based damage begins, 20 years for videotape, five years for hard drives, three years for data tape, and two years for optical media. The general rule is that the newer the medium, the faster it rots.”
A key tenet of archival best practices is diversification. The less one relies on one storage method, the better. “You want geographical, technological, organizational, and solution diversity so that no single incident can make your data disappear,” says Gustman. “Using Azure Blob Storage helps us get there.” Spurred by those requirements, recent reductions in cloud storage costs, and its own well-established datacenter preservation best practices, USC Shoah Foundation replicated its archive in Azure. Gustman continues, “We’ve connected our own preservation infrastructure and software—which we’ve been using for synchronization and reporting—with Azure, so we can constantly monitor the health of our collection.” Indeed, the Institute is using Azure Monitor to review and analyze metrics and logs and maintain infrastructure observability.
A durable digital archive
USC Shoah Foundation worked with Tape Ark, a member of the Microsoft Partner Network, on a comprehensive migration process. This involved the ingestion of roughly 1,000 tapes, more than 100,000 hours of video plus tens of millions of JPEG files, and the verification of every piece of data ingested. The Institute needed to make sure that the content replicated to the cloud was identical to the file signature from the database when the file was originally created. That way, it could properly form a chain of custody that showed provenance, from creation to storage on tape to replication in Azure. This was no small feat, considering that the archive, including its attending assets, consisted of 6 petabytes—or 120,000 hours of video in 44 languages, all indexed into one-minute, searchable clips.
The Institute uses the Archive tier of Azure Blob Storage as its cloud-based backup archive storage solution. The Institute chose Blob Storage because it’s specialized for storing massive amounts of infrequently accessed, unstructured data at a low, per-gigabyte cost.
In addition to cost-effective distributed cloud storage, the Institute takes advantage of automated life cycle management and data checking in Azure. Historically, USC Shoah Foundation had verified the integrity of its distributed archive versions by tracking changes across copies and replacing and overwriting compromised data with accurate data from other archive versions. The data checking in Azure aligns with the Institute’s verification approach, making it an intuitive and valuable storage addition. Gustman says, “We found that the aggressive data checking in Azure parallels our own. We can check our environment, add that to our audit, and verify that the copy is whole.”
Anita Pace, Managing Director of Technology at USC Shoah Foundation, adds, “We’ve also been able to build seamlessly with Azure. We can bring in new content using our normal pathways, but we can also upload directly to Azure, as we do to our other mirror sites. We use the APIs that Azure provides to easily synch up our new content while maintaining the storage of our other content.”
The Institute values the Azure global infrastructure as an important part of its archive diversification strategy. Says Gustman, “With Azure Blob Storage, we’ve added another diverse way of preserving our irreplaceable content. We especially appreciate that Azure is geographically distributed, which helps protect our data from regional incidents.”
According to Pace, the Institute is building multiple copies of its archive and implementing additional innovative solutions. She says, “We benefit from the simplicity and functionality of Azure. And working closely with Azure specialists on our blob and monitoring design, along with our dashboard, was an amazing process that helped us gain another full copy of the archive.”
The value of preservation
The Institute’s move to Blob Storage and the Azure platform helps keep the archive globally distributed, thereby better ensuring its permanence. Gustman states, “We want to avoid any kind of large, systematic failure, which could take our data out over time, by diversifying across technologies, companies, organizations, and regions. Replicating our archive in Azure and using Blob Storage for cloud-based backup storage helps us get that diversification.”
USC Shoah Foundation can make strategic decisions with greater agility since deploying its cloud storage solution. For instance, it can move copies of the entire preservation version of its archive to whichever geographic location it needs to. In fact, the Institute is planning to make a copy of its full preservation archive for another of Microsoft’s datacenter locations soon. Gustman says, “We can use the distributed Azure datacenters to move content around the world far more easily than we could by ourselves.”
Pace and her colleagues find it straightforward to conduct data checking in Azure as well. “We’ve gained intuitive systems and dashboards through our Azure adoption, along with a clear process flow,” she says. “Our ability to look ‘under the hood’ with Azure to keep track of our files is so valuable, as is its ease of use.”
USC Shoah Foundation is pleased to have the reliability and resilience of the cloud, along with the capabilities it needs to verify that its archives are not compromised. Gustman concludes, “Aside from diversification, the big piece for us is the ability to data check, and we’re excited to see new features and tools from Azure to help us continue to monitor the health of our critical content in Azure.” Pace concurs, “We look forward to future Azure implementations to gain even more options for monitoring and managing our content.”
Find out more about USC Shoah Foundation on Twitter, Facebook, and LinkedIn.
“We’ve also been able to build seamlessly with Azure.… We use the APIs that Azure provides to easily synch up our new content while maintaining the storage of our other content.”
Anita Pace, Managing Director of Technology, USC Shoah Foundation – The Institute for Visual History and Education
Follow Microsoft