Collaborative, Large-Scale Data Analytics and Visualization with Python


September 18, 2013


Travis Oliphant


FOCUS Foundation and Continnum Analytics


NumPy and recently Pandas have made Python ubiquitous for scientific computing and data analytics. The technical stack for Python works very well for a wide variety of problems that fit in single-address space (RAM of a single computer). For problems that require larger data sets, current solution approaches are to use memory-mapped files, MPI, IPython parallel and/or a standard map-reduce system like Disco (or Hadoop). These techniques typically significantly complicate the software solution from the simple array (table)-oriented expression that makes NumPy (Pandas) so powerful and popular. These approaches can also result in significant data movement throughout the memory hierarchy (which is the common bottleneck in data-centric computing today). Blaze, is an array / table for python that can be used to manage and manipulate very-large, disjoint, data sets in an array-oriented fashion with Python. It is built on a C++-library (dynd) that provides dynamic, multi-dimensional arrays with flexible data types. It also leverages Numba, an array-oriented, python compiler that takes a subset of the Python syntax to LLVM IR and optimized machine code. In this talk I will discuss Blaze and Numba design and roadmap. I will also provide an overview and example of web-based visualizations with Bokeh which allows Python developers to easily produce interactive, web-based visualizations leading in to an overview of Wakari which provides easy access to executable IPython notebooks in the cloud.


Travis Oliphant

Travis has a Ph.D. from the Mayo Clinic and B.S. and M.S. degrees in Mathematics and Electrical Engineering from Brigham Young University. Since 1997, he has worked extensively with Python for numerical and scientific programming, most notably as the primary developer of the NumPy package, and as a founding contributor of the SciPy package. He is also the author of the definitive “Guide to NumPy”.

Travis was an assistant professor of Electrical and Computer Engineering at BYU from 2001-2007, where he taught courses in probability theory, electromagnetics, inverse problems, and signal processing. He also served as Director of the Biomedical Imaging Lab, where he researched satellite remote sensing, MRI, ultrasound, elastography, and scanning impedance imaging.

From 2007-2011, Travis was the President at Enthought, Inc. During his tenure there, the company grew from 15 to 50 employees, and Travis worked with well-known Fortune 50 companies in finance, oil-and-gas, and consumer-products. He was involved in all aspects of the contractual relationship, including consulting, training, code-architecture, and development. As CEO of Continuum Analytics, Travis engages customers, develops business strategy, and guides technical direction of the company. He actively contributes to software development and engages with the wider open source community in the Python ecosystem.