{"id":171312,"date":"2014-03-11T09:29:10","date_gmt":"2014-03-11T09:29:10","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/project\/clickture\/"},"modified":"2020-02-11T08:56:36","modified_gmt":"2020-02-11T16:56:36","slug":"clickture","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/clickture\/","title":{"rendered":"Clickture"},"content":{"rendered":"<h2>Update: The dataset, created in March 2014, is no longer available for download to conform with relevant policies on user data retention.<\/h2>\n<h2 class=\"asset-content\">A Large-Scale Real-World Image Dataset<\/h2>\n<p>We argue that the massive amount of click data from commercial search engines provides a data set that is unique in the bridging of the semantic and intent gap. Search engines generate millions of click data (a.k.a. image-query pairs), which provide almost &#8220;unlimited&#8221; yet strong connections between semantics and images, as well as connections between users&#8217; intents and queries. This site is to introduce such as dataset, Clickture.<\/p>\n<div id=\"en-usprojectsclickturedefault\" class=\"page-content\">\n<p align=\"justify\">The dataset, named Clickture, was sampled from one-year click log of a commercial image search engine. It consists of a big table with 212:3 million triads: Clickture = {<K, Q, C>}. A triad <K, Q, C> means that the image K was clicked C times in the search results of query Q in one year (maybe by different users at different times). Image K is represented by a unique &#8220;key&#8221; which is hash code generated from the image URL, together with the original URL. Query Q is a textual word or phrase, and click count C is an integer which is no less than one. One image may correspond with to one or more entries in the table. One query may also appear in multiple entries triads that are associated with different images. There are 40 million unique (in terms of URLs) image keys, that is, images in the dataset, and 73.6 million unique queries (based on textual string comparison in lower case) in the Clickture.<\/p>\n<p align=\"justify\">Through users\u2019 click action during image search, the query Q in the triad is linked to the image K. In general, the bigger the click count C is, the higher probability that the corresponding query is relevant to the image. For convenience, we call Q a \u201cclicked query\u201d of Image K, and K a \u201cclicked image\u201d of query Q, and call \u2329K,Q\u232a a \u201cclicked image-query pair\u201d, and the triad \u2329K,Q,C\u232a as \u201cclick data\u201d. We also call \u201cclicked queries\u201d of an image as \u201clabels\u201d of the image.<\/p>\n<p align=\"justify\">To enable the use of Clickture by a wide range of research organizations and individuals with different computing, networking, storage and programing capacities, a subset of Clickture images (1 million images and 11.7 million queries), is provided. We call this set Clickture-Lite and the full 40M dataset Clickture-Full (or in brief Clickture). The 1M images in Clickture-Lite are randomly sampled from the 40M image dataset (based on click frequency).<\/p>\n<h2 align=\"justify\">Related Events<\/h2>\n<ul>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.acmmm.org\/2014\/call_mm_grnd_chlng_sol.html\" target=\"_blank\" rel=\"noopener noreferrer\">ACM Multimedia Grand Challenge 2014<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0(Based on Clickture-Lite and optionally Clickture-Full)<\/li>\n<li>ICME Grand Challenge 2014\u00a0(Based on Clickture-Lite)<\/li>\n<li>MSR-Bing Image Retrieval Grand Challenge 2013\u00a0(Based on Clickture-Lite)<\/li>\n<\/ul>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Update: The dataset, created in March 2014, is no longer available for download to conform with relevant policies on user data retention. A Large-Scale Real-World Image Dataset We argue that the massive amount of click data from commercial search engines provides a data set that is unique in the bridging of the semantic and intent [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13562,13551,13555],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-171312","msr-project","type-msr-project","status-publish","hentry","msr-research-area-computer-vision","msr-research-area-graphics-and-multimedia","msr-research-area-search-information-retrieval","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2014-03-11","related-publications":[],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[],"msr_research_lab":[],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171312","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171312\/revisions"}],"predecessor-version":[{"id":636384,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171312\/revisions\/636384"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=171312"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=171312"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=171312"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=171312"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=171312"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}