Uncertain< T >: A First-Order Type for Uncertain Data

MSR-TR-2013-46 |

Sampled data from sensors, the web, and people is inherently probabilistic. Because programming languages use discrete types (floats, integers, and booleans), applications, ranging from GPS navigation to web search to polling, express and reason about uncertainty in idiosyncratic ways. This mismatch causes three problems. (1) Using an estimate as a fact introduces errors (walking through walls). (2) Computation on estimates compounds errors (walking at 59 mph). (3) Inference asks questions incorrectly when the data can only answer probabilistic question (e.g., “are you speeding?” versus “are you speeding with high probability”).

This paper introduces the uncertain type (Uncertain< T >), an abstraction that expresses, propagates, and exposes uncertainty to solve these problems. We present its semantics and a recipe for (a) identifying distributions, (b) computing, (c) inferring, and (d) leveraging domain knowledge in uncertain data. Because Uncertain< T > computations express an algebra over probabilities, Bayesian statistics ease inference over disparate information (physics, calendars, and maps). Uncertain< T > leverages statistics, learning algorithms, and domain expertise for experts and abstracts them for nonexpert developers. We demonstrate Uncertain< T > on two applications. The result is improved correctness, productivity, and expressiveness for probabilistic data.