Bringing together history and data science with Microsoft Azure
By Winnie Cui, Senior Research Program Manager, Microsoft Research Asia
Andrea Nanetti is a historian and associate professor at the School of Art, Design and Media at Nanyang Technological University (NTU), Singapore. Some might think that, as a historian, he’d be an unlikely attendee at a data science presentation. However, when Nenatti learned about a talk given by Dr. Hsiao-Wuen Hon, from Microsoft Research Asia, he knew it could apply to his unique needs.
“As a historian, I deal with ever-growing pieces of information and data every day. It’s been a challenging task for historians to compose coherent stories in an efficient way using all available data. The talk really excited me, as I realized not only computer scientists could help me, but I should be able to help solve their problems as well by using historical sciences.” —Andrea Nanetti.
Forming a collaboration
In computer science, accounting for user experience is a crucial part to designing any program. Take search engines, for example. Today, entering any phrase into a search engine would return innumerable results. But, while the volume of results is impressive, the information could be somewhat fractured, repetitive or overlapping.
The capabilities of today’s search engines just aren’t intelligent enough to meet the needs of their human users. Ideally, a search engine would organize the information into a coherent, or nearly coherent story, and even provide narrative elements like who, what, when, where, why, and how.
To create a more intelligent and human optimized search, Nanetti and his colleague, Professor Siew An Cheong, partnered with Dr. Chin-Yew Lin, a computer scientist on Knowledge and Data Mining and data scientist from Microsoft Research Asia (Beijing), and Kristin Tolle, Director of Data Science from Microsoft Research Redmond.
Getting to work
Dr. Chin-Yew Lin and his team of mining researchers developed an entity linking service. This service identifies related items and links them together. “The different ways to mention the same entity can very well be used for our service. It will help address the gap in the current search tools and leapfrog the service. It’s like Bing gives you a ‘panoramic’ view of an entity and tells you a story,” said Lin.
At the same time, Nanetti worked with a global team of experts, including Professor Angelo Cattaneo from the New University of Lisbon, Stefano Bertocci from Florence University, Professor John Melville Jones from the University of Western Australia, and Professor Gherardo Ortalli from the University of Venice Ca’ Foscari. The team migrated the information embedded in ancient historical documents into an English-speaking knowledge aggregator on Microsoft Azure. This type of data on Azure provided a way to easily link entities across various historical (or even modern) databases, helping computer scientists analyze complex systems across different datasets and uncover even more insights by joining multiple datasets.
The collaboration between the data science and history experts yielded tremendous results for all involved. The first presentation of the project was awarded ‘Best Paper’ at the 2013 international conference, Culture and Computing, for “Frontier research in culture and computing.”
Their academic results were peer-reviewed and presented at top-level world conferences in 2015, including: the 3rd Triennial International Conference of the Asian Association of World Historians, the 22nd Quinquennial International Conference of Historical Sciences, and the 27th Biennial International Conference of Cartography.
Moving forward, Nanetti and colleagues hope to further their findings, expanding their research to apply to both space and time. This expanded research could unveil new and exciting possibilities. For example, search results could generate “movies”. These movies could include narratives that explain a complete story behind the search data. Or, maps could be both spatial and temporal. Anything is possible when you’re able to truly “see” the stories in the data.