{"id":1723,"date":"2012-10-02T09:00:00","date_gmt":"2012-10-02T09:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/msr_er\/2012\/10\/02\/dataupdata-curation-for-the-long-tail-of-science\/"},"modified":"2016-07-20T07:32:32","modified_gmt":"2016-07-20T14:32:32","slug":"dataupdata-curation-for-the-long-tail-of-science","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/dataupdata-curation-for-the-long-tail-of-science\/","title":{"rendered":"DataUp\u2014Data Curation for the Long Tail of Science"},"content":{"rendered":"<p><span style=\"font-family: verdana,geneva; font-size: medium;\">The <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/en.wikipedia.org\/wiki\/Long_tail\" target=\"_blank\">long tail<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>: sure, it&rsquo;s a well-known concept in business and marketing, but there&rsquo;s a very important &ldquo;hidden&rdquo; long tail in the sciences, too. So, what is this hidden long tail of science? It consists of the millions of datasets that are not stored in a databank and therefore are not available for use by other scientists. Every day, researchers throughout the world are observing, calculating, and compiling data, recording it all on their local machines within their labs&mdash;often not even as a shared resource to their institutions. Regrettably, much of this data never gets deposited in larger web-accessible data repositories where it could be reused by other investigators around the globe.<\/span><\/p>\n<p style=\"text-align: center;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/dataup.cdlib.org\/\" target=\"_blank\"><img decoding=\"async\" style=\"border: 0px currentColor;\" title=\"Learn more about DataUp\" alt=\"Learn more about DataUp\" src=\"https:\/\/msdnshared.blob.core.windows.net\/media\/MSDNBlogsFS\/prod.evol.blogs.msdn.com\/CommunityServer.Blogs.Components.WeblogFiles\/00\/00\/01\/32\/81\/7888.DataUp.jpg\" original-url=\"http:\/\/blogs.msdn.com\/resized-image.ashx\/__size\/496x0\/__key\/communityserver-blogs-components-weblogfiles\/00-00-01-32-81\/7888.DataUp.jpg\" \/><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/dataup.cdlib.org\/\" target=\"_blank\"><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/dataup.cdlib.org\/\" target=\"_blank\"><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">As a researcher myself and working with other researchers from around the globe, I am acutely aware of scientific data pain points; after all, those of us in the research community understand better than anyone that data preservation, curation, and sharing are critical for the advancement of scientific discovery. We want to share our data beyond our immediate groups, but many times we find ourselves hindered by a lack of tools and services designed to promote data curation and sharing.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">Enter DataUp, an open-source tool that helps us document, manage, and archive our tabular data. The DataUp project was born out of this need for seamless integration of data management into the researchers&rsquo; current workflows. The <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.cdlib.org\/services\/uc3\" target=\"_blank\">University of California Curation Center<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (UC3) at the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.cdlib.org\/\" target=\"_blank\">California Digital Library<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;(CDL), with sponsorship from Microsoft Research and the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.moore.org\/\" target=\"_blank\">Gordon and Betty Moore Foundation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (GBMF), focused on creating a tool that could be used by researchers in the environmental sciences. They recognized that this field epitomizes the problems of data management and curation; in particular, the storage of data locally without data description (metadata)&mdash;such as where it was collected, by whom, and when&mdash;that would make it more usable by others.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">By conducting surveys at ecological and environmental science events, CDL found that the majority of these scientists use spreadsheets to collect and organize their data, so rather than make them learn a new program, UC3 recognized a need for a tool that works with a program most scientists already know: Microsoft Excel.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">From the results of further surveys, it was determined that about half of the scientists preferred a tool that would be installed on their laptop, while the other half wanted a web-based tool that they could use on any device. Well, we sponsors and the UC3 team were not about to let this divided preference thwart the creation of a much-needed tool, so, together, we decided that there needed to be two versions of the tool: an <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/bitbucket.org\/dataup\/main\/downloads\/DataUpAddIn.zip\" target=\"_blank\">open-source add-in<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (extension) for Microsoft Excel, and an <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.dataup.org\/\" target=\"_blank\">open-source web application<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">To achieve the project goals of facilitating data management, sharing, and archiving, both the add-in and the web application accomplish four main tasks:<\/span><\/p>\n<ol>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\">Perform a best-practices check to ensure good data organization<\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\">Guide users through creation of metadata for their Excel file<\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\">Help users obtain a unique identifier for their dataset<\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\">Connect users to a major repository, where their data can be deposited and shared with others<\/span><\/li>\n<\/ol>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">The California Digital Library established the initial repository, the ONEShare. <\/span><span style=\"font-family: verdana,geneva; font-size: medium;\">Researchers will be able to find tools from the DataUp project as part of the Investigator Toolkit for <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.dataone.org\/\" target=\"_blank\">DataONE<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. <\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">I want to thank <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/academic.research.microsoft.com\/Author\/24826801\/carly-a-strasser\" target=\"_blank\">Carly Strasser<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/academic.research.microsoft.com\/Author\/2862536\/patricia-cruse\" target=\"_blank\">Trisha Cruse<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/academic.research.microsoft.com\/Author\/10512051\/john-kunze\" target=\"_blank\">John Kunze<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/academic.research.microsoft.com\/Author\/9155692\/stephen-l-abrams\" target=\"_blank\">Stephen Abrams<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> from UC3 for their passion and commitment to bring DataUp to life. I also want to thank Chris Mentzel from GBMF for co-funding the project with <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/en-us\/collaboration\/\" target=\"_blank\">Microsoft Research Connections<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: medium;\">Now, get out there and DataUp!<\/span><\/p>\n<p><em><span style=\"font-family: verdana,geneva; font-size: medium;\">&mdash;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/en-us\/people\/ktolle\/\" target=\"_blank\">Kristin Tolle<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Director, Microsoft Research Connections<\/span><\/em><\/p>\n<p><strong><span style=\"font-family: verdana,geneva; font-size: medium;\">Learn More<\/span><\/strong><\/p>\n<ul>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/bitbucket.org\/dataup\/main\/downloads\/DataUpAddIn.zip\" target=\"_blank\">Download the Excel add-in<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/www.dataup.org\/\" target=\"_blank\">Access the DataUp Web Application<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/dataup.cdlib.org\/\" target=\"_blank\">DataUp California Digital Library website<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/en-us\/projects\/dataup\/default.aspx\" target=\"_blank\">DataUp project page<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/blogs.technet.com\/b\/openness\/archive\/2012\/10\/02\/spotlight-on-microsoft-research-improving-scientific-data-sharing-and-management.aspx\" target=\"_blank\">Spotlight on Microsoft Research: Improving Scientific Data Sharing and Management<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/research.microsoft.com\/en-us\/collaboration\/focus\/education\/default.aspx\" target=\"_blank\">Education and Scholarly Communication at Microsoft Research Connections<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/span><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>The long tail: sure, it&rsquo;s a well-known concept in business and marketing, but there&rsquo;s a very important &ldquo;hidden&rdquo; long tail in the sciences, too. So, what is this hidden long tail of science? It consists of the millions of datasets that are not stored in a databank and therefore are not available for use by [&hellip;]<\/p>\n","protected":false},"author":32627,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[194584,194886,194913,195004,193598,195255,195262,195269,195286,195287,193654,195718,196053,196171,196409,196439,196714,187102,186788,196982,197317,197441,197509,197539,197739],"research-area":[],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1723","post","type-post","status-publish","format-standard","hentry","category-research-blog","tag-add-in","tag-california-digital-library","tag-carly-strasser","tag-chris-mentzel","tag-data","tag-data-curation","tag-data-repositories","tag-data-sharing","tag-dataone","tag-dataset","tag-dataup","tag-gordon-and-betty-moore-foundation","tag-john-kunze","tag-kristin-tolle","tag-microsoft-excel","tag-microsoft-research-connections","tag-oneshare","tag-open-source","tag-preservation","tag-repository","tag-stephen-abrams","tag-the-university-of-california-curation-center","tag-trisha-cruse","tag-uc3","tag-web-app","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"October 2, 2012","formattedExcerpt":"The long tail: sure, it&rsquo;s a well-known concept in business and marketing, but there&rsquo;s a very important &ldquo;hidden&rdquo; long tail in the sciences, too. So, what is this hidden long tail of science? It consists of the millions of datasets that are not stored in&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1723","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/32627"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1723"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1723\/revisions"}],"predecessor-version":[{"id":261855,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1723\/revisions\/261855"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1723"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1723"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1723"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1723"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1723"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1723"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1723"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1723"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1723"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1723"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1723"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}