Big data without the big expense

21 May 2013 | Ted Malone, Big Data Architecture lead for Microsoft Federal
​ Last year, Steve Lohr of the New York Times noted that no entity on the planet produces, gathers, and stores more data than the American government. That’s an amazing statistic, but with our nation’s agencies tackling society’s biggest challenges – from disease tracking to anti-terrorism activities – it shouldn’t come as a shock. In 2009 the U.S. government produced 848 petabytes (10005 bytes) of data. To put that into perspective, just five exabytes (10006 bytes) of data would contain all the words ever spoken by every human being on earth.    

That’s a lot of information, and it reinforces that our government is tackling some of the biggest “big data” challenges that exist. Last week President Obama signed an Executive Order outlining steps to make government-held data more accessible, which is a fantastic step forward. But how can agencies manage, analyze, and extract important insights from these enormous data sets in the face of unprecedented budget cuts? Big data solutions have historically meant buying more commodity hardware and storage – resulting in additional costs, expanded IT infrastructures, and a larger energy footprint. All results that directly hinder agency momentum towards datacenter consolidation, and becoming leaner and more cost effective.

The good news is that plummeting storage costs and emerging technology tools are making big data analysis easier, more affordable, and achievable within the modern datacenter environment. Hadoop is a critical big data tool for any organization, but the requirement to scale out on commodity hardware with direct attached storage can pose problems for a large agency trying to manage data center server and storage sprawl. To combat this challenge, the private cloud comes to the rescue. Agencies now have the ability to execute Hadoop on Microsoft Windows Server within a private, in-agency cloud to take full advantage of the benefits of virtualization while pursuing big data initiatives. This empowers agencies to conduct big data analysis in a securely managed environment, as part of their existing data analytics infrastructure, while keeping in line with datacenter consolidation, budget, and efficiency goals. 

The advantages of this approach to big data are further extended by the fact that agencies can conduct complex big data analyses using software tools that they already own. I recently chatted with Rutrell Yasin of Government Computer News on this topic, and he wrote a great piece about how the PowerPivot feature in Excel can consolidate millions of rows of data from multiple sources, which can then be used to develop rich visualizations, author interactive reports, and conduct sophisticated analyses. So government agencies can get started today, using familiar, interoperable, existing tools to stand up a big data solution, either on-premises or in the cloud. 

The “big” in big data doesn’t have to mean expensive and complex.  Agencies can embrace the private cloud and tools they already own to uncover the insights they need to make better decisions, improve organizational efficiency, and most importantly, deliver improved services to citizens.

More detail on the big data momentum happening within government can be found within this white paper.   

Have a comment or opinion on this post? Let me know @Microsoft_Gov. Have a question for the author? Please e-mail us at


Ted Malone
Big Data Architecture lead for Microsoft Federal