January 27, 2016

Azure Partner Community: Big data, advanced analytics, and lambda architecture

By Jonathan Gardner

Welcome to part 2 of this month’s Azure Partner Community blog series, about data platforms and advanced analytics. Read part 1.

by Jonathan Gardner
US Partner Technology Strategist for Microsoft Azure

A discussion about data platforms and advanced analytics, this month’s Azure Partner Community blog series focus, must of course include the topic of big data. In my conversations with partners, I usually find that we need to level-set about what “big data” is, and then cover the basics of advanced analytics. In this post, I’ll outline the foundation for these conversations.

Big data

I love talking to people about their environments and their data. The environments vary wildly in size and data type. But whether they really have “big data” is related to whether their data have one of these three V’s: Volume, Variety, or Velocity.

Volume Variety Velocity
For years, organizations have collected vast amounts of data, and continue to do so exponentially.

Here are examples of scientific data collection that demonstrate volume:

  • In 2000, the Sloan Digital Sky Survey collected more data in its first week than was collected in the entire history of astronomy
  • By 2016, the New Large Synoptic Survey Telescope in Chile will acquire 140 terabytes in 5 days—more than Sloan acquired in 10 years
  • The Large Hadron Collider at CERN generates 40 terabytes of data every second

The amount of data being collected can reach into the hundreds of GB, TB, PB range. I recently saw this statistic: In 2010, Twitter generated more than 1 TB of tweets daily.

These examples are meant to be extreme, but I have worked with smaller organizations that have hundreds of TB of data. That qualifies as having big data.

Variety refers to the type of data that an organization collects.

An organization may have structured data from their ERP system and unstructured data that they are collecting for brand analysis from social media. These two data sets vary not only by type but in schema as well.

Organizations that want to make sense of these seemingly unrelated data types have big data. With these data types, customers can analyze complex questions. For example, many customers are looking at whether their presence on Twitter or their brand sentiment on Twitter and other social media platforms are affecting sales.

In the context of big data, velocity means that data that are typically small in size are entering the system at a rapid rate.

This is the type of data generated by sensors, Internet of Things devices, or SCADA systems.

These type of environments can generate 100,000 1kb tuples per second.

Data analytics pipeline and lambda architecture

There continues to be debate about additional ways to define big data, but what I’ve established so far in this discussion allows us to shift focus to how the data are actually processed.

The stages of the data analytics pipeline follow the logical flow of the data: ingest, processing, storage, and delivery. When we discuss the three V’s, it is clear that there are many different types of data, and the size that is needed to process can be quite large. Enter lambda architecture.

Lambda architecture was designed to meet the challenge of handing the data analytics pipeline through two avenues, stream-processing and batch-processing methods. These two data pathways merge just before delivery to create a holistic picture of the data. The streaming layer handles data with high velocity, processing them in real-time. The batch layer handles large volumes of data. Batch processing can take extended periods of time. By combining the layers, the streaming data can fill in the time gap missing in the batch layer. The image below illustrates this concept.



Lambda architecture and Microsoft Azure

With an understanding of lambda architecture, you can see that Microsoft has aligned Azure services to provide tools all along the pipeline. The below image outlines how Azure big data services fit into the lambda architecture.


Getting started

Search the Azure Partner Readiness Catalog by keyword, level, source, and feature

Big data courses on Microsoft Virtual Academy

Advanced analytics courses on Microsoft Virtual Academy

Microsoft Azure courses on Microsoft Virtual Academy

Getting Started with Microsoft Big Data: Introduction to Big Data

Watch online

Go to the full course


image     image     image

Other posts you may like