Virginia Tech Exec Q&A

Virginia Tech is using the Microsoft Azure Cloud to create cloud-based tools to assist with medical breakthroughs via next-generation sequence (NGS) analysis. This NGS analysis requires both big computing and big data resources. A team of computer scientists at Virginia Tech is addressing this challenge by developing an on-demand, cloud-computing model using the Azure HDInsight Service. By moving to an on-demand cloud computing model, researchers will now have easier, more cost-effective access to DNA sequencing tools and resources, which could lead to even faster, more exciting advancements in medical research.

We caught up with Wu Feng, Professor in the Department of Computer Science and Department of Electrical & Computer Engineering and the Health Sciences at Virginia Tech, to discuss the benefits he is seeing with cloud computing.

Q: What is the main goal of your work?

We are working on accelerating our ability to use computing to assist in the discovery of medical breakthroughs, including the holy grain of “computing a cure” for cancer. While we are just one piece of a giant pipeline in this research, we seek to use computing to more rapidly understand where cancer starts in the DNA. If we could identify where and when mutations are occurring, it could provide an indication of which pathways may be responsible for the cancer and could, in turn, help identify targets to help cure the cancer. It’s like finding a “needle in a haystack,” but in this case we are searching through massive amounts of genomic data to try to find these “needles” and how they connect and relate to each other “within the haystack.”

Q: What are some ways technology is helping you?

We want to enable the scientists, engineers, physicists and geneticists and equip them with tools so they can focus on their craft and not on the computing. There are many interesting computing and big data questions that we can help them with, along this journey of discovery.

Q: Why is cloud computing with Microsoft so important to you?

The cloud can accelerate discovery and innovation by computing answers faster, particularly when you don’t have bountiful computing resources at your disposal. It enables people to compute on data sets that they might not have otherwise tried because they didn’t have ready access to such resources.

For any institution, whether a company, government lab or university, the cost of creating or updating datacenter infrastructure, such as the building, the power and cooling, and the raised floors, just so a small group of people can use the resource, can outweigh the benefits. Having a cloud environment with Microsoft allows us to leverage the economies of scale to aggregate computational horsepower on demand and give users the ability to compute big data, while not having to incur the institutional overhead of personally housing, operating and maintaining such a facility.

Q: Do you see similar applications for businesses?

Just as the Internet leveled the playing field and served as a renaissance for small businesses, particularly those involved with e-commerce, so will the cloud. By commoditizing “big data” analytics in the cloud, small businesses will be able to intelligently mine data to extract insight with activities, such as supply-chain economics and personalized marketing and advertising.

Furthermore, quantitative analytic tools, such as Excel DataScope in the cloud, can enable financial advisors to accelerate data-driven decision-making via commoditized financial analytics and prediction. Specifically, Excel DataScope delivers data analytics, machine learning and information visualization to the Microsoft Azure Cloud.

In any case, just like in the life sciences, these financial entities have their own sources of data deluge. One example is trades and quotes (TAQ), where the amount of financial information is also increasing exponentially. Unfortunately, to make the analytics process on the TAQ data a more tractable one, the data is often triaged into summary format and thus could potentially and inadvertently filter out critical data that should not have been.

Q: Are you saving money or time or experiencing other benefits?

Back when we first thought of this approach, we were wondering if it would even a feasible solution for the cloud. For example, with so much data to upload to the cloud, would the cost of transferring data from the client to the cloud outweigh the benefits of computing in the cloud?  With our cloud-enabling of a popular genome analysis pipeline, combined with our synergistic co-design of the algorithms, software, and hardware in the genome analysis pipeline, we realized about a three-fold speed-up over the traditional client-based solution.

Q: What does the future look like?

There is big business in computing technology, whether it is explicit, as in the case of personal computers and laptops, or implicit, as in the case of smartphones, TVs or automobiles. Just look how far we have come over the past seven years with mobile devices. However, the real business isn’t in the devices themselves, it’s in the ecosystem and content that supports these devices: the electronic commerce that happens behind the scenes. In another five years, I foresee the same thing happening with cloud computing. It will become a democratized resource for the masses. It will get to the point where it will be just as easy to use storage in the cloud as it will be to flip a light switch; we won’t think twice about it. The future of computing and data lies in the cloud, and I’m excited to be there as it happens.


For more information about Azure HDInsight, check out the website and start a free trial today.