The scale and speed of today’s software development efforts imposes unprecedented constraints on the pace and quality of decisions made during planning, implementation, and post-release maintenance and support of software. Decisions during the planning process include, level of staffing and development model given the scope of a project and timelines. Tracking progress and course correcting, identifying and mitigating risks are the key in the development phase. As are monitoring aspects of and improving overall customer satisfaction in the maintenance and support phase. Availability of relevant data can greatly increase both the speed as well as likelihood of making a decision that leads to a successful software system.

This paper outlines the process Microsoft Research has gone through developing a Software Analytics Data platform (CODEMINE) for collecting and analyzing engineering process data, its constraints and pivotal organizational and technical choices. We start by describing exemplary uses of CODEMINE, motivating its architecture and schema design. We conclude with key lessons learnt from deploying the platform across product teams at Microsoft.