{"id":692358,"date":"2020-09-24T10:32:48","date_gmt":"2020-09-24T17:32:48","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=692358"},"modified":"2021-04-28T11:16:56","modified_gmt":"2021-04-28T18:16:56","slug":"measuring-dataset-similarity-using-optimal-transport","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/measuring-dataset-similarity-using-optimal-transport\/","title":{"rendered":"Measuring dataset similarity using optimal transport"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/1400x788_AutoML_Transfer-5f6278d3d03eb.gif\" alt=\"\"\/><\/figure>\n\n\n\n<p>Is FashionMNIST, a dataset of images of clothing items labeled by category, more similar to MNIST or to USPS, both of which are classification datasets of handwritten digits? This is a pretty hard question to answer, but the solution could have an impact on various aspects of machine learning. For example, it could change how practitioners augment a particular dataset to improve the transferring of models across domains or how they select a dataset to pretrain on, especially in scenarios where labeled data from the target domain of interest is scarce.<\/p>\n\n\n\n<p>In our recent paper, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/geometric-dataset-distances-via-optimal-transport\/\">\u201cGeometric Dataset Distances via Optimal Transport,\u201d<\/a> we propose the Optimal Transport Dataset Distance, or the OTDD for short, an approach to defining and computing similarities, or <em>distances<\/em>, between classification datasets. The OTDD relies on optimal transport (OT), a flexible geometric method for comparing probability distributions, and can be used to compare <em>any two datasets<\/em>, regardless of whether their label sets are directly comparable. As a bonus, the OTDD returns a <em>coupling<\/em> of the two datasets being compared, which can be understood as a set of soft correspondences between individual items in the datasets. Correspondences can be used to answer questions such as the following: Given a data point in one dataset, what is its corresponding point in the other dataset? In this post, we show the distances and correspondences obtained with our method for five popular benchmark datasets and give an overview of how the OTDD is computed, what it has to do with shoveling dirt, and why it\u2019s a promising tool for transfer learning.<\/p>\n\n\n\n<h2 id=\"why-is-measuring-distance-between-labeled-datasets-hard\">Why is measuring distance between <em>labeled <\/em>datasets hard?<\/h2>\n\n\n\n<p>Comparing any two distinct classification datasets, like the datasets of clothing and handwritten digits mentioned above, poses at least three obvious challenges:<\/p>\n\n\n\n<ol class=\"wp-block-list\" type=\"1\"><li>They might have different <em>cardinality<\/em>, or number of points.<\/li><li>They might have different native dimensionality (for example, MNIST digits are 28 \u00d7 28 pixels, while USPS digits are 16 \u00d7 16).<\/li><li>Their labels might correspond to different concepts, as is the case with FashionMNIST&nbsp;and MNIST and USPS\u2014fashion items versus digits.<\/li><\/ol>\n\n\n\n<p>Note the first two challenges are also applicable to <em>unlabeled<\/em> datasets, but the third challenge\u2014which, as we\u2019ll see below, is the most difficult\u2014is specific to labeled datasets.<\/p>\n\n\n\n<p>Intuitively, the number of examples should have little bearing on the distance between datasets&nbsp;(after all, whether MNIST has 70,000 points or 30,000, it\u2019s still&nbsp;<em>essentially<\/em> MNIST). We can enforce this invariance to dataset size by thinking about datasets as <em>probability distributions, <\/em>from which finitely many samples are drawn, and comparing those instead. Similarly, the dimension of the input should not play a major role\u2014if any\u2014in the distance we seek. For example, the essence of MNIST is the same regardless of image size. Here, we\u2019ll assume that images are up- or down-sampled as needed to make images in the two datasets being compared the same size (we discuss how to relax this in the paper).&nbsp;<\/p>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"999693\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">Spotlight: Event Series<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/microsoft-research-forum\/?OCID=msr_researchforum_MCR_Blog_Promo\" aria-label=\"Microsoft Research Forum\" data-bi-cN=\"Microsoft Research Forum\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/05\/Research-Forum-hero_1400x788.jpg\" alt=\"Research Forum | abstract background with colorful hexagons\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">Microsoft Research Forum<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"microsoft-research-forum\" class=\"large\">Join us for a continuous exchange of ideas about research in the era of general AI. Watch the first four episodes on demand.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/microsoft-research-forum\/?OCID=msr_researchforum_MCR_Blog_Promo\" aria-describedby=\"microsoft-research-forum\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cN=\"Microsoft Research Forum\" target=\"_blank\">\n\t\t\t\t\t\t\tWatch on-demand\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<p>The last of these challenges\u2014dealing with datasets having disjoint label sets\u2014is much harder to overcome. Indeed, how can we compare the category \u201cshoe\u201d from FashionMNIST to the \u201c6\u201d category in MNIST? And what if the number of categories is different in the two datasets? For example, what if we\u2019re comparing MNIST, which has 10 categories, to ImageNet, which has 1,000? Our solution to this conundrum, in a nutshell, relies on representing each label by the collection of points with that label and, as is the case with enforcing invariance to dataset size, formally treating the collections as probability distributions. Thus, we can compare any two categories across different datasets by comparing their associated collections\u2014understood, again, as probability distributions\u2014in feature space, which also then allows us to extend the comparison across the entire datasets themselves.<\/p>\n\n\n\n<p>The approach we\u2019ve sketched so far banks on being able to compute distances between two different kinds of probability distributions, those corresponding to the labels and those corresponding to the entire datasets. In addition, ideally, we need to do this calculation in a computationally feasible way. Enter optimal transport, which provides the backbone of our approach.<\/p>\n\n\n\n<h2 id=\"optimal-transport-comparing-by-transporting\">Optimal transport: Comparing by \u2018transporting\u2019<\/h2>\n\n\n\n<p>Optimal transport traces its roots back to 18th-century France, where the mathematician Gaspard Monge was concerned with finding optimal ways to transport <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/gallica.bnf.fr\/ark:\/12148\/bpt6k35800\/f796.item\">dirt and rubble from one location to another. <span class=\"sr-only\"> (opens in new tab)<\/span><\/a>Let\u2019s consider an individual using a shovel to move dirt, a simplified version of the scenario Monge had in mind. By his formulation (below),&nbsp;each movement of the shovel between two piles of dirt carries a cost proportional to the distance traveled by the shovel multiplied by the mass of dirt carried. Then, the total cost of transporting dirt between the piles is the sum of the cost of these individual movements.&nbsp;<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD_Shovel-Figure.png\" alt=\"\" class=\"wp-image-692526\" width=\"640\" height=\"479\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD_Shovel-Figure.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD_Shovel-Figure-300x225.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD_Shovel-Figure-80x60.png 80w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD_Shovel-Figure-240x180.png 240w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><figcaption>Optimal transport was born as a method to find least-cost schemes to transport dirt and rubble from one place to another. Thinking about probability distributions as piles of dirt, optimal transport intuitively quantifies their dissimilarity in terms of how much and how far the \u201cdirt,\u201d or probability mass, must be shoveled to transform one pile into the other.<\/figcaption><\/figure><\/div>\n\n\n\n<p>But what does dirt and shoveling have to do with statistics or machine learning? As it turns out, the intuitive framework devised by Monge provides an ideal formulation for comparing probability distributions. Let us think of probability density functions as the piles of dirt, where the \u201cheight\u201d of the pile corresponds to the probability density at that point, and <em>shoveling<\/em> dirt between the piles as moving probability from one point to another, at a cost proportional to the distance between these two points. Optimal transport gives us a way to quantify the similarity between two probability density functions in terms of the lowest total cost incurred by completely shoveling one pile into the shape and location of the other. <\/p>\n\n\n\n<p>Formally, the general optimal transport problem between two probability distributions \\(\\alpha \\) and \\(\\beta\\) over a space \\(\\mathcal{X}\\) is defined as:<\/p>\n\n\n\n<p class=\"has-text-align-center\">\\(\\min_{\\pi \\in \\Pi(\\alpha, \\beta) } \\int_{\\mathcal{X} \\times \\mathcal{X}} d(x,x&#8217;)\\text{d}\\pi(x,x&#8217;)\\)<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>Here \\(\\pi\\) is a joint distribution (formally, a coupling) with marginals \\(\\alpha \\) and \\(\\beta\\). When the cost \\(c(x,y)\\) is taken to be the distance \\(d(x,x\u2019) = \\| x \u2013 x\u2019 \\|^p\\), the value of this problem is known as the p-Wasserstein distance, and it\u2019s denoted by \\(\\text{W}_p\\).<\/p>\n\n\n\n<h2 id=\"distances-between-feature-label-pairs\">Distances between feature-label pairs<\/h2>\n\n\n\n<p>Using optimal transport to compare two probability distributions requires defining a distance between <em>points<\/em> sampled from those distributions. In our case, in which we\u2019re comparing two datasets, each point \\(z\\) is a pair comprising a feature vector\u2014an image for the datasets discussed here\u2014and a label. So we need to be able to compute a distance between, let\u2019s say, the pair (\\(x\\),&#8220;six&#8221;), where \\(x\\) is an image of a \u201c6\u201d from MNIST, and the pair (\\(x&#8217;\\),&#8220;shoe&#8221;), where \\(x&#8217;\\) is an image of a shoe from FashionMNIST. The first part is easy: We can compute distances between the images using various standard approaches. Defining a distance between their labels is, as we discussed earlier, much more complicated. But is it worth it? What happens if we ignore the label and just use the features to compute the distance? The visualization below shows what could go wrong. Ignoring the labels might lead us to believe two datasets are very similar when in fact, from a classification perspective, they\u2019re quite different.<\/p>\n\n\n\n<div class=\"wp-block-group alignfull\"><div class=\"wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow\">\n<iframe loading=\"lazy\" style=\"border:none;\" src=\"\/en-us\/research\/wp-content\/internal-scripts\/geometric-dataset-distance-optimal-transport-embed\/public\/index.html#\/labels\" seamless=\"\" width=\"100%\" height=\"600px\">\n<\/iframe>\n<\/div><\/div>\n\n\n\n<p class=\"has-text-align-center has-small-font-size\"><strong>Interactive Visualization:<\/strong> How important is it to use labels when comparing classification datasets? This interactive visualization shows it is indeed crucial when determining dataset similarity. Two datasets with similar shapes in feature space can be very different from a classification perspective if their labels (depicted in blue and green) are randomly flipped. Slide the scroll bars associated with each dataset left and right to rotate the point cloud datasets and to shuffle their labels; the coupling and OT and OTDD will respond to the change. Rotating the datasets has a similar effect on the distance obtained via normal OT and the OTDD, but shuffling the labels causes the value of the OTDD to increase much more than that of normal OT.<\/p>\n\n\n\n<p>Since taking into account the labels of the <em>points <\/em>seems crucial, then how should we go about defining a distance between them? Earlier, we hinted at our proposed solution: We\u2019ll represent labels as conditional probability distributions \\(P_y = P(X|Y=y)\\) and compute a distance between those. And here, too, optimal transport comes to our rescue\u2014we can use it to compute these distances!<\/p>\n\n\n\n<p class=\"has-text-align-center\">\\(d(z,z&#8217;) = \\left( d(x,x&#8217;)^2 + \\text{W}_2(P_y,P_y&#8217;)^2 \\right)^{\\frac{1}{2}} \\)<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>And in turn, thanks to optimal transport, we also have a distance between distributions <em>over<\/em> feature-label pairs\u2014that is, datasets\u2014which is our Optimal Transport Dataset Distance:<\/p>\n\n\n\n<p class=\"has-text-align-center\">\\(\\text{OTDD}(\\mathcal{D}_A, \\mathcal{D}_B) = \\min_{\\pi \\in \\Pi(P_A, P_B ) } \\int_{\\mathcal{Z} \\times \\mathcal{Z}} d(z,z&#8217;)\\text{d}\\pi(z,z&#8217;) \\)<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>A high-level visual summary of the OTDD is shown in the animation at the top of the post; its application in the case of five specific datasets is demonstrated in the visualization below, which provides some insight into our opening question: Is FashionMNIST more similar to MNIST or to USPS? The first pane of the visualization shows the distances between every choice of two datasets; lower numbers and lighter shades of blue represent more similarity. FashionMNIST is actually closer to USPS than to MNIST in terms of the OTDD. In the paper, we discuss in detail how to make the computation of the OTDD feasible and efficient, even for very large datasets.<\/p>\n\n\n\n<div class=\"wp-block-group alignfull\"><div class=\"wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow\">\n<iframe loading=\"lazy\" style=\"border:none;\" src=\"\/en-us\/research\/wp-content\/internal-scripts\/geometric-dataset-distance-optimal-transport-embed\/public\/index.html#\/ot\" seamless=\"\" width=\"100%\" height=\"600\">\n<\/iframe>\n<\/div><\/div>\n\n\n\n<p class=\"has-text-align-center has-small-font-size\"><strong>Interactive Visualization:<\/strong> The first pane shows the OTDD between five popular benchmark datasets; lower numbers and lighter shades of blue represent more similarity, while higher numbers and darker shades represent less. Select a dataset pair in the first pane to visualize the embeddings of those two datasets and the optimal transport coupling between them. Hovering over the embeddings shows the image represented by a given embedding point and its \u201cbest match\u201d in the other datasets according to the OTDD.<\/p>\n\n\n\n<h2 id=\"otdd-predicts-pretraining-transferability\">OTDD predicts pretraining transferability<\/h2>\n\n\n\n<p>One of the key observations in the paper is that the notion of distance we propose is highly predictive of <em>transferability<\/em>&nbsp;across datasets\u2014that is, how successful training a model in one dataset and then fine-tuning it in a different dataset will be. We demonstrate this across various datasets and data types, such as image and text classification (below figures). This is remarkable because it suggests that our approach could be used to select which dataset to pretrain on by choosing the \u201cclosest\u201d one, in terms of OTDD, to the target dataset of interest.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"442\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Updated-Figure-1024x442.jpg\" alt=\"\" class=\"wp-image-692790\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Updated-Figure-1024x442.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Updated-Figure-300x129.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Updated-Figure-768x331.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Updated-Figure-1536x662.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Updated-Figure.jpg 1605w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>These scatter plots show the OTDD is highly predictive of transferability across both image datasets (left) and text datasets (right). Here transferability is measured as the relative drop in test error brought by pretraining on the source domain. The arrows indicate the direction of transfer. For example, M->E corresponds to pretraining on MNIST and fine-tuning and evaluating on EMNIST. See the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/geometric-dataset-distances-via-optimal-transport\/\">paper <\/a>for full abbreviation key.<\/figcaption><\/figure><\/div>\n\n\n\n<h2 id=\"otdd-can-tell-you-how-to-augment-your-dataset\">OTDD can tell you how to augment your dataset<\/h2>\n\n\n\n<p>Most state-of-the-art methods for image classification involve pretraining on a large-scale source dataset enhanced with some form of data augmentation, such as adding rotated or cropped versions of the images. Choosing the most beneficial transformation is hard and often involves expensive repeated training of large models. Another takeaway from the paper is our tool can inform this decision too, by estimating which transformations bring the source data closer, in the OTDD sense, to the target dataset of interest.<\/p>\n\n\n\n<p>As an example, the visualization below shows how two components of the OTDD\u2019s inner workings\u2014label-to-label distances and optimal coupling\u2014change as we modify MNIST through cropping and rotating while leaving USPS fixed.<\/p>\n\n\n\n<div class=\"wp-block-group alignfull\"><div class=\"wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow\">\n<iframe loading=\"lazy\" style=\"border:none;\" src=\"\/en-us\/research\/wp-content\/internal-scripts\/geometric-dataset-distance-optimal-transport-embed\/public\/index.html#\/otdd\" seamless=\"\" width=\"100%\" height=\"600px\">\n<\/iframe>\n<\/div><\/div>\n\n\n\n<p class=\"has-text-align-center has-small-font-size\"><strong>Interactive Visualization:<\/strong> How does transforming MNIST affect its similarity to USPS? Select a type of transformation to see how it modifies MNIST and the effect it has on the label-to-label distances and correspondences (coupling) computed by the OTDD to estimate its distance to USPS. For example, cropping the digits in MNIST leads to correspondences that are more coherent across corresponding digit classes, while rotating the digits has the opposite effect.<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>And how does the resulting distance relate to the quality of the augmentation in terms of its benefit to transfer learning? To test this, we generated multiple versions of MNIST using various types of transformations, computed the distance between them and USPS, and separately computed the increase in classification accuracy obtained by pretraining on any of the transformed MNIST datasets and fine-tuning and testing on USPS. The visualization below shows samples from the transformed datasets and demonstrates that, again, the OTDD is highly correlated with transferability.<\/p>\n\n\n\n<div class=\"wp-block-group alignfull\"><div class=\"wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow\">\n<iframe loading=\"lazy\" style=\"border:none;\" src=\"\/en-us\/research\/wp-content\/internal-scripts\/geometric-dataset-distance-optimal-transport-embed\/public\/index.html#\/augmentations\" seamless=\"\" width=\"100%\" height=\"600px\">\n<\/iframe>\n<\/div><\/div>\n\n\n\n<p class=\"has-text-align-center has-small-font-size\"><strong>Interactive Visualization: <\/strong>How does transforming MNIST affect the transferability of a classifier trained on it and fine-tuned on USPS? Select a type of transformation to see how it affects the transferability (measured as accuracy improvement) and the OTDD between the transformed MNIST and USPS. The scatter plot shows the transformations that lead to the best transferability are precisely those that reduce the OTDD the most. The strong correlation between these two quantities suggests the OTDD could be used to predict transferability success.<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>In the paper, we present additional experiments with augmentations on ImageNet, for which we observe similarly encouraging results.<\/p>\n\n\n\n<h2 id=\"moving-forward-with-the-otdd\">Moving forward with the OTDD<\/h2>\n\n\n\n<p>Throughout this post, we\u2019ve seen how ideas originating from a need to efficiently transport dirt and rubble across different locations can be used to compare seemingly incomparable classification datasets, yielding a promising tool for guiding transfer learning and data augmentation. Besides these two tasks, we foresee significant potential benefits in its use within meta-learning to assess\u2014and therefore leverage\u2014task similarity in the learning process. We\u2019re also interested in going beyond static comparison of datasets, as we\u2019ve discussed here, and using the OTDD <em>dynamically <\/em>to sequentially modify one dataset, for example, to achieve a desired similarity to another relevant dataset of interest.<\/p>\n\n\n\n<hr class=\"wp-block-separator is-style-dots\"\/>\n\n\n\n<p>Would you like to know more about AutoML research and its community at Microsoft? Join us for our free virtual speaker series Directions in ML: AutoML and Automating Algorithms. Our next speaker will be Dr. David Alvarez-Melis from Microsoft Research. On November 18 at 10 AM PT, Alvarez-Melis will present \u201cAutomating Dataset Comparison and Manipulation with Optimal Transport.\u201d <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/directions-in-ml\/#!upcoming-speaker\">Learn more and register<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Is FashionMNIST, a dataset of images of clothing items labeled by category, more similar to MNIST or to USPS, both of which are classification datasets of handwritten digits? This is a pretty hard question to answer, but the solution could have an impact on various aspects of machine learning. For example, it could change how [&hellip;]<\/p>\n","protected":false},"author":38838,"featured_media":692451,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"David Alvarez-Melis","user_id":"38814"},{"type":"user_nicename","value":"Nicolo Fusi","user_id":"31829"}],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-692358","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199563],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[545241],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"David Alvarez-Melis","user_id":38814,"display_name":"David Alvarez-Melis","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/daalvare\/\" aria-label=\"Visit the profile page for David Alvarez-Melis\">David Alvarez-Melis<\/a>","is_active":false,"last_first":"Alvarez-Melis, David","people_section":0,"alias":"daalvare"},{"type":"user_nicename","value":"Nicolo Fusi","user_id":31829,"display_name":"Nicolo Fusi","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/fusi\/\" aria-label=\"Visit the profile page for Nicolo Fusi\">Nicolo Fusi<\/a>","is_active":false,"last_first":"Fusi, Nicolo","people_section":0,"alias":"fusi"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Hero-Featured-Image-Site-960x540.png\" class=\"img-object-cover\" alt=\"\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Hero-Featured-Image-Site-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Hero-Featured-Image-Site-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Hero-Featured-Image-Site-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Hero-Featured-Image-Site-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Hero-Featured-Image-Site-1536x864.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Hero-Featured-Image-Site-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Hero-Featured-Image-Site-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Hero-Featured-Image-Site-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Hero-Featured-Image-Site-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Hero-Featured-Image-Site-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/09\/OTDD-Hero-Featured-Image-Site.png 1587w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/daalvare\/\" title=\"Go to researcher profile for David Alvarez-Melis\" aria-label=\"Go to researcher profile for David Alvarez-Melis\" data-bi-type=\"byline author\" data-bi-cN=\"David Alvarez-Melis\">David Alvarez-Melis<\/a> and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/fusi\/\" title=\"Go to researcher profile for Nicolo Fusi\" aria-label=\"Go to researcher profile for Nicolo Fusi\" data-bi-type=\"byline author\" data-bi-cN=\"Nicolo Fusi\">Nicolo Fusi<\/a>","formattedDate":"September 24, 2020","formattedExcerpt":"Is FashionMNIST, a dataset of images of clothing items labeled by category, more similar to MNIST or to USPS, both of which are classification datasets of handwritten digits? This is a pretty hard question to answer, but the solution could have an impact on various&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/692358","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38838"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=692358"}],"version-history":[{"count":67,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/692358\/revisions"}],"predecessor-version":[{"id":698167,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/692358\/revisions\/698167"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/692451"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=692358"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=692358"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=692358"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=692358"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=692358"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=692358"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=692358"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=692358"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=692358"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=692358"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=692358"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}