Kaggle is a company and platform launched to provide a link between organizations needing specialized analytic and scientific skills with a global pool of researchers and scientists
who can provide those skills. The Kaggle offering caught on so fast that it quickly grew beyond the capabilities of its original online platform, which could not provide the scalability or flexibility that Kaggle needed to grow and support market demand. So
the company turned to
Windows Azure and Microsoft development tools to relaunch the site. With Windows Azure, Kaggle has a highly scalable platform capable of supporting quick spikes in new users and data traffic, along with a development environment that supports continuous
A frustrating and sometimes costly challenge faced by many large organizations is finding the right people with the skills to research, test, and validate products and ideas that may have a major impact on the company’s bottom line. Without access to the
right skillsets, projects can be delayed or wither altogether. On the other side of the equation, skilled researchers and analysts are often looking for projects where they can put their solutions, algorithms, and other intellectual efforts to work in real-world
||CEOs and CIOs may have predictive modeling high on their priority lists, but it’s hard to find the talent or solution that is going to fit their specific needs.
| Anthony Goldbloom
Founder and Chief Executive Officer
Finding a way to bring those two parties together was the brainchild of Anthony Goldbloom, an entrepreneur and analyst who had puzzled over the issue during stints at various banking and financial institutions, including the Australian Treasury, the Reserve
Bank of Australia, and ANZ—the third largest bank in Australia. Goldbloom had also served briefly as a writer at The Economist magazine where, during interviews with different organizations, the problem came into sharp focus.
“Most large companies have huge amounts of corporate data, and their top executives are well aware of the need to make sense of it,” says Goldbloom. “CEOs and CIOs may have predictive modeling high on their priority lists, but it’s hard to find the talent
or solution that is going to fit their specific needs. It’s especially problematic when particular solutions might cost millions of dollars, yet the companies don’t know if that particular solution is the right fit for what they’re trying to accomplish.”
The scope of the problem and lack of a good solution led Goldbloom to found
Kaggle, a company and online platform that links organizations facing tough problems with some of the best minds in the world. Kaggle uses a crowdsourcing model to solve complex problems, with competitions and reward money acting as incentives to attract
analysts and scientists from around the world to tackle complex scientific and industrial challenges.
Kaggle was originally built using Amazon Web Services, the PHP scripting language, and the MySQL open source database. Before long, however, the platform became problematic for the Kaggle team.
“Amazon Web Services is relatively complex to use,” says Jeremy Howard, President and Chief Scientist for Kaggle. “You cannot just turn it on and it runs. There is quite a bit of setup and configuration required. Also, maintaining code with PHP was too difficult
in terms of using it as the language over a long period. It does not lend itself to concise, easily maintainable code.”
Equally important was a need to scale quickly and easily on demand—a need that Kaggle felt Amazon could not meet. The issue came to a head at the end of 2010, when Kaggle signed a large client—the Heritage Provider Network, which offered a $3 million award
for the best algorithm that could predict and prevent unnecessary patient hospitalizations in the United States.
“After signing this client, we felt that the existing platform supporting Kaggle could not scale and support the levels of activity we would experience in much larger competitions like this one,” says Howard.
Kaggle decided to switch to a Microsoft environment, using
Windows Azure as its cloud platform along with Windows development tools. The site was rewritten using Microsoft Visual Studio development tools,
Microsoft Visual C#. Kaggle received assistance in its efforts through the
Microsoft BizSpark program.
“We decided to use Visual C# as our primary tool. It had the expressiveness that we were looking for in a programming language that could help us build on the excitement generated by the Kaggle competitions. At the same time, it provided the speed to do
more sophisticated programming,” says Howard. “And, because Windows Azure is specifically designed for ASP.NET apps, it was easy for us to get the solution up and running with the tight integration of the Visual Studio tools.”
The Kaggle site also uses
Windows Azure Compute, which lets the company run its application code in the cloud. Each Windows Azure Compute instance runs as a virtual machine that is isolated from other customers, and is supported by the network load balancing and failover capabilities
of Windows Azure. The site also uses Windows Azure web roles, Windows Azure worker roles, and blob storage. A cloud-based database is provided by Microsoft
Organizations with predictive modeling problems fill out a simple wizard on the Kaggle website, which automatically creates a competition for participating data scientists. Since the deployment of the Windows Azure-based site, Kaggle has grown significantly,
with more than 32,000 users participating in the site by early 2012. Organizations such as Allstate,NASA,
andFord have posted data and their related problems on Kaggle.
By turning to Windows Azure and the Microsoft development tools, the Kaggle team was able to quickly rewrite the code. The site is easier to modify and upgrade as needed to keep pace with the growing popularity of the Kaggle service. It is highly scalable
to accommodate large clients and uploading of large data sets. It is also a cost-effective solution for Kaggle as the company moves out of its startup phase.
Platform Supports Constant Innovation
Howard says the power of the Windows development tools and their integration with Windows Azure simplified the task of rewriting the code for the site while also supporting innovation.
“It took us about one month to completely rewrite the code and move the site to Windows Azure,” he says. “That included moving the database to Microsoft SQL Azure, which was relatively simple using the
Microsoft SQL Server Integration Services. It was very cool to have much of this process automated because it streamlined the entire process.”
Howard adds that the Microsoft tools support the fast implementation of new ideas. “Being a startup, we’re always changing things,” he says. “Windows Azure and the Microsoft development environment support the kind of continuous innovation that’s important
for our growth. And with Visual C#, we can make all of our changes in one place—it’s very expressive, concise, and fast.”
Scales Easily to Support User, Data Traffic
Windows Azure is highly scalable, in terms of being able to handle large spikes in the number of users as well as the data traffic that occurs with Kaggle competition activity. “It’s good to know that if we sign a big client or competition and get thousands
of new users signing on, it’s a simple matter of adding more computing capability on Windows Azure. If we get a burst of new participants on the site and were not able to scale as easily as we can on Windows Azure, people would have to wait a long time to
get on the site,” says Howard.
He notes that the platform easily handles large data sets that are downloaded by competition participants.
“It can be anywhere from a few kilobytes to hundreds of gigabytes of information at once,” Howard says. “Windows Azure has provided a solid performance experience for our users.”
Helps Control Costs for Startup
With Windows Azure, Kaggle was able to adopt a pay-as-you-go platform model for its business.
“This is incredibly important for any young company in a rapid growth stage,” Howard says. “We don’t have to worry about and plan for potentially expensive and complex IT overhead. Windows Azure provides the platform backend so we can focus on innovation.”
He adds that Windows Azure provides a very “frictionless way” to host the kind of company that Kaggle is trying to build.
“With Windows Azure, we don’t have to give much thought to the infrastructure that’s supporting our company,” he says. “It’s a very powerful, pragmatic platform for a startup firm.”
Microsoft Cloud Services
Microsoft offers a complete set of cloud-based solutions to meet business needs, including solutions for advertising, communications (email, meetings), collaboration (document storage, sharing, workflow), business applications (customer resource management,
business productivity), data storage and management, and infrastructure services. In addition, customers can take advantage of an entire ecosystem of solution providers and Microsoft partners. For more information, please visit
For More Information
For more information about Microsoft products and services, call the Microsoft Sales Information Center at
(800) 426-9400. In Canada, call the Microsoft Canada Information Centre at
(877) 568-2495. Customers in the United States and Canada who are deaf or hard-of-hearing can reach Microsoft text telephone (TTY/TDD) services at
(800) 892-5234. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information using the World Wide Web, go to:
For more information about Microsoft BizSpark, please visit:
For more information about Kaggle products and services, call or visit the website at:www.kaggle.com
This case study is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.