{"id":1005396,"date":"2024-09-26T12:36:22","date_gmt":"2024-09-26T19:36:22","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=1005396"},"modified":"2025-04-28T14:47:43","modified_gmt":"2025-04-28T21:47:43","slug":"asl-stem-wiki","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/asl-stem-wiki\/","title":{"rendered":"ASL STEM Wiki"},"content":{"rendered":"<section class=\"mb-3 moray-highlight\">\n\t<div class=\"card-img-overlay mx-lg-0\">\n\t\t<div class=\"card-background  has-background- card-background--full-bleed\">\n\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1790\" height=\"843\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/algae_screenshot.png\" class=\"attachment-full size-full\" alt=\"Screenshot of a web interface showing an ASL interpreter on the left, and an article on the right segmented by sentence. One particular sentence is highlighted, and the signer is frozen.\" style=\"\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/algae_screenshot.png 1790w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/algae_screenshot-300x141.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/algae_screenshot-1024x482.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/algae_screenshot-768x362.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/algae_screenshot-1536x723.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/algae_screenshot-240x113.png 240w\" sizes=\"auto, (max-width: 1790px) 100vw, 1790px\" \/>\t\t<\/div>\n\t\t<!-- Foreground -->\n\t\t<div class=\"card-foreground d-flex mt-md-n5 my-lg-5 px-g px-lg-0\">\n\t\t\t<!-- Container -->\n\t\t\t<div class=\"container d-flex mt-md-n5 my-lg-5 \">\n\t\t\t\t<!-- Card wrapper -->\n\t\t\t\t<div class=\"w-100 w-lg-col-5\">\n\t\t\t\t\t<!-- Card -->\n\t\t\t\t\t<div class=\"card material-md-card py-5 px-md-5\">\n\t\t\t\t\t\t<div class=\"card-body \">\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n<h1 class=\"wp-block-heading\" id=\"asl-stem-wiki\">ASL STEM Wiki<\/h1>\n\n\n\n<p>Dataset and Benchmark for Interpreting STEM Articles<\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n<p>To help advance the state of sign language modeling, we created ASL STEM Wiki \u2014 the first continuous signing dataset focused on Science, Technology, Engineering, and Math (STEM). The corpus contains 254 Wikipedia articles on STEM topics in English, interpreted into 300 hours of American Sign Language (ASL). In addition to its size and topic, unlike many prior datasets, it contains videos of professional signers, including many CDIs (Certified Deaf Interpreters), and was collected with consent from each contributor under IRB approval. Deaf research team members were involved throughout.<\/p>\n\n\n\n<p>This dataset is released alongside our paper identifying several use cases for ASL STEM Wiki and providing baselines for one of these tasks &#8212; fingerspelling detection and identification. Because the dataset focuses on STEM, and STEM terminology often lacks standardized signs, fingerspelling of technical terms appears frequently in our dataset. To help identify fingerspellings, we provide models for fingerspelling detection and alignment, and release benchmark performance on the ASL STEM Wiki dataset for the research community to build on. Our models highlight the difficulty of the detection and alignment task, and provide the first evidence that self-supervised contrastive pretraining can improve fingerspelling detection.<\/p>\n\n\n\n<p>Our dataset empowers a small bilingual resource for students, providing full English texts for STEM articles alongside professional ASL interpretations. This resource enables students and other readers to access spot-translations for select sentences, and to play through entire articles as desired. We release this resource as well.<\/p>\n\n\n\n<p>This project was conducted at Microsoft Research with collaborators.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Microsoft: Danielle Bragg (PI), Hal Daum\u00e9 III, Alex Lu, Vanessa Milan, Fyodor Minakov, Chinmay Singh, Cyril Zhang<\/li>\n\n\n\n<li>University of California, Berkeley: Kayo Yin<\/li>\n<\/ul>\n\n\n\n<p><strong>Dataset License:<\/strong>&nbsp;Please see the supporting tab. If you are interested in commercial use, please contact&nbsp;<a href=\"mailto:ASL_Citizen@microsoft.com\" target=\"_blank\" rel=\"noreferrer noopener\">ASL_Citizen@microsoft.com<\/a>.&nbsp;<\/p>\n\n\n\n<p><strong>Dataset Download:<\/strong><\/p>\n\n\n\n<p>To download via web interface, please visit: <a href=\"https:\/\/www.microsoft.com\/en-us\/download\/details.aspx?id=106253\" target=\"_blank\" rel=\"noreferrer noopener\">Download ASL STEM Wiki from Official Microsoft Download Center<\/a><\/p>\n\n\n\n<p>To download via command line, please execute: wget https:\/\/download.microsoft.com\/download\/4\/c\/f\/4cfec788-7478-4e47-9a15-ace9b6a96198\/ASL_STEM_Wiki.zip<\/p>\n\n\n\n<p><strong>Bilingual STEM article resource:<\/strong> <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/aslgames.azurewebsites.net\/wiki\" target=\"_blank\" rel=\"noopener noreferrer\">Wiki &#8211; The ASL Data Community<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<p><strong>Open-source Repo:<\/strong> <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/ASL-STEM-Wiki\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/github.com\/microsoft\/ASL-STEM-Wiki<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n\n\n\n<p><strong>Citation:<\/strong>&nbsp;If you use this dataset in your work, please cite&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/abs\/2411.05783\" target=\"_blank\" rel=\"noopener noreferrer\">our paper<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">@inproceedings{yin-etal-2024-asl,<br>    title = \"{ASL} {STEM} {W}iki: Dataset and Benchmark for Interpreting {STEM} Articles\",<br>    author = \"Yin, Kayo  and<br>      Singh, Chinmay  and<br>      Minakov, Fyodor O  and<br>      Milan, Vanessa  and<br>      Daum{\\'e} III, Hal  and<br>      Zhang, Cyril  and<br>      Lu, Alex Xijie  and<br>      Bragg, Danielle\",<br>    editor = \"Al-Onaizan, Yaser  and<br>      Bansal, Mohit  and<br>      Chen, Yun-Nung\",<br>    booktitle = \"Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing\",<br>    month = Nov,<br>    year = \"2024\",<br>    address = \"Miami, Florida, USA\",<br>    publisher = \"Association for Computational Linguistics\",<br>    url = \"https:\/\/aclanthology.org\/2024.emnlp-main.801\",<br>    pages = \"14474--14490\",<br>    abstract = \"Deaf and hard-of-hearing (DHH) students face significant barriers in accessing science, technology, engineering, and mathematics (STEM) education, notably due to the scarcity of STEM resources in signed languages. To help address this, we introduce ASL STEM Wiki: a parallel corpus of 254 Wikipedia articles on STEM topics in English, interpreted into over 300 hours of American Sign Language (ASL). ASL STEM Wiki is the first continuous signing dataset focused on STEM, facilitating the development of AI resources for STEM education in ASL.We identify several use cases of ASL STEM Wiki with human-centered applications. For example, because this dataset highlights the frequent use of fingerspelling for technical concepts, which inhibits DHH students{'} ability to learn,we develop models to identify fingerspelled words{---}which can later be used to query for appropriate ASL signs to suggest to interpreters.\",<br>}<\/pre>\n\n\n\n<p><strong>Acknowledgements:<\/strong>&nbsp;We are deeply grateful to all community members who participated in this dataset project.<\/p>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n\n\n<p>Deaf and hard-of-hearing (DHH) individuals communicate in many ways, including&nbsp;through sign language. Along with being an accessible modality for DHH individuals, signed languages are culturally significant, forming a cornerstone of shared experience, cultural identity, and institutions for Deaf communities worldwide. <\/p>\n\n\n\n<p>American Sign Language (ASL) is the primary sign language used in North America (and several other parts of the world). Like other signed languages, signs in ASL are composed of phonological elements including handshapes, hand and body movements, hand location, and facial expressions. Vocabulary size is large, and complex rules govern how these signs are put together to make sentence. These rules for combining words, phonology, and morphology into sentences are rich, and substantially different from English. Sign execution also varies among signers, across contexts, and through regions and dialects. ASL is a complete, natural language in its own right.<\/p>\n\n\n\n<p>Signed languages play a central role in Deaf cultures and identity. Groups of people who primarily communicate in a signed language form distinct cultures as sociolinguistic minorities within the broader hearing majority. Within these communities, Deafness is a proud cultural identity. Despite the richness of signed languages and Deaf cultures, Deaf communities have a history of marginalization and oppression by the hearing majority. Many harmful misconceptions and biases exist within hearing communities about signed languages and Deaf people. For example, education systems have suppressed the use of signed languages, to the detriment of many deaf students.<\/p>\n\n\n\n<p>Given this context, it is particularly important that sign language technologies are developed in partnership with Deaf communities and with an understanding of Deaf culture and signed languages. To this end, we involved Deaf collaborators in key roles at every step of this project, including conception, recruitment, participation, analysis, and dissemination. We encourage those using this dataset to educate themselves on Deaf culture and American Sign Language in order to conduct research and build systems that are useful to Deaf community members while minimizing harms.<\/p>\n\n\n\n<p>As an entry point to more information on Deaf cultures and sign languages,&nbsp;please check out the following resources.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/nationaldeafcenter.org\/resource-items\/deaf-community-introduction\/\" target=\"_blank\" rel=\"noopener noreferrer\">The Deaf Community: An Introduction \u2013 National Deaf Center<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n\n\n\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.nad.org\/resources\/american-sign-language\/community-and-culture-frequently-asked-questions\/\" target=\"_blank\" rel=\"noopener noreferrer\">Community and Culture Frequently Asked Questions \u2013 National Association of the Deaf<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n\n\n\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/wfdeaf.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">Home Page \u2013 World Federation of the Deaf<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n\n\n\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.museumofdeaf.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">Home Page \u2013 Museum of Deaf History, Arts & Culture<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<\/ul>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n\n\n<p>ASL STEM Wiki is the first continuous ASL dataset of Science Technology Engineering and Mathematics (STEM) material. The dataset consists of 64,266 videos spanning 316 hours of content. The videos were recorded by 37 professional ASL interpreters, and are interpretations of 254 STEM-focused Wikipedia articles. Each recording corresponds to a sentence (or section title) of one of the articles. Unlike prior continuous sign language datasets, our dataset was collected with consent, recorded by trusted professional interpreters including Certified Deaf Interpreters, and focuses on STEM content.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>Dataset<\/td><td>Source & Topic<\/td><td>Signers<\/td><td>Consent?<\/td><td># Hours<\/td><\/tr><tr><td>How2Sign (Duarte et al., 2021)<\/td><td>&#8220;How-to&#8221; YouTube<\/td><td>Interpreter<\/td><td>Yes<\/td><td>80<\/td><\/tr><tr><td>OpenASL (Shi et al., 2022)<\/td><td>Deaf YouTube<\/td><td>Deaf & Interpreter<\/td><td>No<\/td><td>268<\/td><\/tr><tr><td>YouTube-ASL (Uthus et al., 2023)<\/td><td>YouTube<\/td><td>Unknown<\/td><td>No<\/td><td>984<\/td><\/tr><tr><td><strong>ASL STEM Wiki<\/strong><\/td><td><strong>STEM Wikipedia<\/strong><\/td><td><strong>Interpreter<\/strong><\/td><td><strong>Yes<\/strong><\/td><td><strong>316<\/strong><\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">Table 1: Properties of existing continuous ASL datasets compared to our new dataset (<strong>ASL STEM Wiki<\/strong>, last row).<\/figcaption><\/figure>\n\n\n\n<p>The interpreted texts consist of 254 STEM-focused Wikipedia articles. These articles fall under the following topic categories: Science (113 articles), Geography (50), Technology (47), Mathematics (26), and Medicine (18). The articles have been segmented into sentences. Each sentence is aligned with the interpretation of that sentence or section title. Article categories, sentence indexing, video alignment, and other article metadata are included in the dataset.<\/p>\n\n\n\n<p>The texts were interpreted into ASL using a custom web interface, first proposed in <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/3517428.3544827\" target=\"_blank\" rel=\"noopener noreferrer\">prior work<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, which displays English articles and interpretations side-by-side. People recording interpretations simultaneously contribute to two purposes: 1) the collection of a continuous dataset to help advance research, and 2) the creation of a new bilingual resource that students and others can use to access articles in both English and ASL. We release the new resource powered by the ASL STEM Wiki recordings (see the &#8220;Bilingual Resource&#8221; tab). While contributing, the interpreters could play back recordings, and re-record as desired. The research team validated the collected videos by removing invalid recordings, manually reviewing a random sample of videos from each contributor, and manually examining length outliers.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"480\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/lengthComparison_final_large.png\" alt=\"chart, scatter chart\" class=\"wp-image-1088076\" style=\"width:421px;height:auto\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/lengthComparison_final_large.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/lengthComparison_final_large-300x225.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/lengthComparison_final_large-80x60.png 80w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/09\/lengthComparison_final_large-240x180.png 240w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Figure 1: Scatterplot of video size in ASL STEM Wiki. Longer English sentences tend to generate longer ASL interpretation videos. x-axis: sentence length (characters), y-axis: video length (seconds).<\/figcaption><\/figure>\n\n\n\n<p>The resulting video dataset consists of 64,266 ASL videos, providing 316 hours of continuous STEM content. Because the videos were recorded by professional interpreters, they include plain backgrounds, and the interpreters typically wear plain-colored contrastive clothing to help with visual clarity. Because the interpreted contents are technical and focused on STEM, a large number of words are fingerspelled, estimated to be around 18.6% of words in our corpus.<\/p>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n\n\n<p>This dataset empowers researchers and developers to pursue a range of new directions, including but not limited to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Fingerspelling detection and recognition &#8211; Our dataset reflects an increased usage of fingerspelling in interpretations of STEM documents. Identifying instances of fingerspelling and mapping these instances onto the corresponding English words can help researchers better understand signing patterns. We provide benchmarks for fingerspelling detection and recognition in our paper accompanying the dataset release. The ability to detect and recognize fingerspelling can also enable richer downstream applications, such as automatic sign suggestion (described subsequently).<\/li>\n\n\n\n<li>Automatic sign suggestion &#8211; Also motivated by the prevalence of fingerspelling in STEM interpretations, we suggest developing systems to detect when fingerspelling is used and suggest appropriate ASL signs to use instead. Suggestions would be dependent on the domain and context (e.g. &#8220;protein&#8221; in the context of nutrition, structural biology, or protein engineering may have distinct ASL signs), as well as on the audience (e.g. the sign to use for an elementary school class may be different from the sign to use with a college audience). Fingerspelling may be appropriate in some cases as well, for example when introducing a new sign that is not well-known.<\/li>\n\n\n\n<li>Translationese\/Interpretese &#8211; Because our dataset is prompted from an English source sentence, it is prone to having effects of <em>translationese<\/em> [1], such as English-influenced word order, segmentation of ASL into English sentence boundaries, signs for English homonyms being used instead of the appropriate sign, and increased fingerspelling. We propose training models to detect and repair translationese, as well as potential translation and interpretation studies around interpretese [2] of ASL.<\/li>\n\n\n\n<li>Sign variation &#8211; Five of our articles are interpreted by all 37 ASL interpreters in our study. These articles provide a unique opportunity to study variations in how individuals sign and interpret the same English sentence, especially STEM concepts where ASL signs are not stabilized.<\/li>\n\n\n\n<li>Sign linking\/retrieval &#8211; Related to <em>sign variation<\/em>, our dataset contains examples of English words that may be interpreted differently across interpreters and context. This data can be used to train models that links different versions of ASL signs for the same concept (e.g. one interpreter may sign &#8220;electromagnetism&#8221; using the signs for ELECTRICITY and MAGNET, another interpreter may interpret the same word using a sign that visually describes an electromagnetic field).<\/li>\n\n\n\n<li>Automatic STEM translation &#8211; Our dataset can be used to train, fine-tune, and\/or evaluate model capabilities in translating technical content from English to ASL. Technically, our dataset could be used to develop models to translate from ASL to English, however, this direction is not preferred since our dataset contains interpreted ASL which may differ from unprompted ASL [2].<\/li>\n<\/ul>\n\n\n\n<p>[1] Moshe Koppel and Noam Ordan. 2011. Translationese and its dialects. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 1318\u20131326, Portland, Oregon, USA. Association for Computational Linguistics.<\/p>\n\n\n\n<p>[2] Miriam Shlesinger. 2009. Towards a definition of interpretese. Benjamins Translation Library (BTL).<\/p>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n\n\n<p>Dataset contributors added videos for two simultaneous purposes: 1) to create a dataset to help advance research as described in the other tabs, and 2) to contribute to a community-sourced bilingual educational resource for STEM knowledge. By creating this bilingual STEM resource, we provide immediate and direct benefits to the signing community, while longer-term benefits derived from research are in progress.<\/p>\n\n\n\n<p>Please check out the bilingual STEM article resource here: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/aslgames.azurewebsites.net\/wiki\">Wiki &#8211; The ASL Data Community<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n\n\n<h2 class=\"wp-block-heading\" id=\"microsoft-research-license-terms\">MICROSOFT RESEARCH LICENSE TERMS<\/h2>\n\n\n\n<p><strong>IF YOU LIVE IN THE UNITED STATES, PLEASE READ THE \u201cBINDING ARBITRATION AND CLASS ACTION WAIVER\u201d SECTION BELOW. IT AFFECTS HOW DISPUTES ARE RESOLVED.<\/strong><\/p>\n\n\n\n<p>These license terms are an agreement between you and Microsoft Corporation (or one of its affiliates). They apply to the source code, object code, or data (collectively \u201cMaterials\u201d) that accompany this license. IF YOU COMPLY WITH THESE LICENSE TERMS, YOU HAVE THE RIGHTS BELOW. BY USING THE MATERIALS, YOU ACCEPT THESE TERMS.<\/p>\n\n\n\n<p><strong>1) INSTALLATION AND USE RIGHTS to The Materials.<\/strong><\/p>\n\n\n\n<p>Subject to the terms of this agreement, you have the below rights, if applicable, to use the Materials solely for non-commercial, non-revenue generating, research purposes:<\/p>\n\n\n\n<p><strong>a)&nbsp;&nbsp;&nbsp; Source Code.<\/strong>&nbsp;If source code is included, you may use and modify the source code, but you may not distribute the source code.<\/p>\n\n\n\n<p><strong>b)&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><strong>Object Code.&nbsp;<\/strong>If object code is included, you may use the object code, but you may not distribute the object code.<\/p>\n\n\n\n<p><strong>c)&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><strong>Data.&nbsp;<\/strong>If data is included, you may use and modify the data, but your use and modification must be consistent with the consent under which the data was provided and\/or gathered and you may not distribute the data or your modifications to the data.<\/p>\n\n\n\n<p><strong>2)&nbsp;&nbsp;&nbsp; SCOPE OF LICENSE.<\/strong>&nbsp;The Materials are licensed, not sold. Microsoft reserves all other rights. Unless applicable law gives you more rights despite this limitation, you will not (and have no right to):<\/p>\n\n\n\n<p><strong>a)&nbsp;&nbsp;&nbsp;&nbsp;<\/strong>work around any technical limitations in the Materials that only allow you to use it in certain ways;<\/p>\n\n\n\n<p><strong>b)&nbsp;&nbsp;&nbsp;&nbsp;<\/strong>reverse engineer, decompile or disassemble the Materials;<\/p>\n\n\n\n<p><strong>c)&nbsp;&nbsp;&nbsp;&nbsp;<\/strong>remove, minimize, block, or modify any notices of Microsoft or its suppliers in the Materials;<\/p>\n\n\n\n<p><strong>d)&nbsp;&nbsp;&nbsp;&nbsp;<\/strong>use the Materials in any way that is against the law or to create or propagate malware; or<\/p>\n\n\n\n<p><strong>e)&nbsp;&nbsp;&nbsp;&nbsp;<\/strong>share, publish, distribute or lend the Materials, provide the Materials as a stand-alone hosted solution for others to use, or transfer the Materials or this agreement to any third party.<\/p>\n\n\n\n<p><strong>3)&nbsp;&nbsp;&nbsp; PERSONAL DATA.<\/strong>&nbsp; If the data (set forth in Section 1(c) above) includes or is found to include any data that enables any ability to identify an individual (\u201cPersonal Data\u201d), you will not use such Personal Data for any purpose other than was authorized and consented to by the data subject\/research participant.&nbsp; You will not use Personal Data to contact any person.&nbsp; You will keep Personal Data in strict confidence.&nbsp; You will not share any Personal Data that is collected or in your possession with any third party for any reason and as required under the original consent agreement.&nbsp; Further, you will destroy the Personal Data and any backup or copies, immediately upon the completion of your research.&nbsp;<\/p>\n\n\n\n<p><strong>4)&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><strong>LICENSE TO MICROSOFT.&nbsp;&nbsp;<\/strong>Notwithstanding the limitations in Section 1, you may distribute your modifications back to Microsoft, and if you do provide Microsoft with modifications of the Materials, you hereby grant Microsoft, without any restrictions or limitations, a non-exclusive, perpetual, irrevocable, royalty-free, assignable and sub-licensable license, to reproduce, publicly perform or display, install, use, modify, post, distribute, make and have made, sell and transfer such modifications and derivatives for any purpose.<\/p>\n\n\n\n<p><strong>5)&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><strong>Publication.&nbsp;&nbsp;<\/strong>You may publish (or present papers or articles) on your results from using the Materials provided that no material or substantial portion of the Materials is included in any such publication or presentation.<\/p>\n\n\n\n<p><strong>6)&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><strong>FEEDBACK.<\/strong>&nbsp;Any feedback about the Materials provided by you to us is voluntarily given, and Microsoft shall be free to use the feedback as it sees fit without obligation or restriction of any kind, even if the feedback is designated by you as confidential.&nbsp; Such feedback shall be considered a contribution and licensed to Microsoft under the terms of Section 4 above.<\/p>\n\n\n\n<p><strong>7)&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><strong>EXPORT RESTRICTIONS.<\/strong>&nbsp;You must comply with all domestic and international export laws and regulations that apply to the Materials, which include restrictions on destinations, end users, and end use. For further information on export restrictions, visit (aka.ms\/exporting).<\/p>\n\n\n\n<p><strong>8)&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><strong>SUPPORT SERVICES.<\/strong>&nbsp;Microsoft is not obligated under this agreement to provide any support services for the Materials. Any support provided is \u201cas is\u201d, \u201cwith all faults\u201d, and without warranty of any kind.<a><\/a><a><\/a><\/p>\n\n\n\n<p><strong>9)&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><strong>BINDING ARBITRATION AND CLASS ACTION WAIVER. This Section applies if you live in (or, if a business, your principal place of business is in) the United States.&nbsp;&nbsp;<\/strong>If you and Microsoft have a dispute, you and Microsoft agree to try for 60 days to resolve it informally. If you and Microsoft can\u2019t, you and Microsoft agree to&nbsp;<strong>binding individual arbitration before the American Arbitration Association<\/strong>&nbsp;under the Federal Arbitration Act (\u201cFAA\u201d), and&nbsp;<strong>not to sue in court in front of a judge or jury<\/strong>. Instead, a neutral arbitrator will decide.&nbsp;<strong>Class action lawsuits, class-wide arbitrations, private attorney-general actions,<\/strong>&nbsp;and any other proceeding where someone acts in a representative capacity&nbsp;<strong>are not allowed<\/strong>; nor is combining individual proceedings without the consent of all parties. The complete Arbitration Agreement contains more terms and is at aka.ms\/arb-agreement-1. You and Microsoft agree to these terms.<\/p>\n\n\n\n<p><strong>10)&nbsp;<\/strong><strong>ENTIRE AGREEMENT.<\/strong>&nbsp;This agreement, and any other terms Microsoft may provide for supplements, updates, or third-party applications, is the entire agreement for the Materials.<\/p>\n\n\n\n<p><strong>11)&nbsp;<\/strong><strong>APPLICABLE LAW AND PLACE TO RESOLVE DISPUTES.<\/strong>&nbsp;If you acquired the Materials in the United States or Canada, the laws of the state or province where you live (or, if a business, where your principal place of business is located) govern the interpretation of this agreement, claims for its breach, and all other claims (including consumer protection, unfair competition, and tort claims), regardless of conflict of laws principles, except that the FAA governs everything related to arbitration. If you acquired the Materials in any other country, its laws apply, except that the FAA governs everything related to arbitration. If U.S. federal jurisdiction exists, you and Microsoft consent to exclusive jurisdiction and venue in the federal court in King County, Washington for all disputes heard in court (excluding arbitration). If not, you and Microsoft consent to exclusive jurisdiction and venue in the Superior Court of King County, Washington for all disputes heard in court (excluding arbitration).<\/p>\n\n\n\n<p><strong>12)&nbsp;<\/strong><strong>CONSUMER RIGHTS; REGIONAL VARIATIONS.<\/strong>&nbsp;This agreement describes certain legal rights. You may have other rights, including consumer rights, under the laws of your state, province, or country. Separate and apart from your relationship with Microsoft, you may also have rights with respect to the party from which you acquired the Materials. This agreement does not change those other rights if the laws of your state, province, or country do not permit it to do so. For example, if you acquired the Materials in one of the below regions, or mandatory country law applies, then the following provisions apply to you:<\/p>\n\n\n\n<p><strong>a)&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><strong>Australia.<\/strong>&nbsp;You have statutory guarantees under the Australian Consumer Law and nothing in this agreement is intended to affect those rights.<\/p>\n\n\n\n<p><strong>b)&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><strong>Canada.<\/strong>&nbsp;If you acquired this software in Canada, you may stop receiving updates by turning off the automatic update feature, disconnecting your device from the Internet (if and when you re-connect to the Internet, however, the Materials will resume checking for and installing updates), or uninstalling the Materials. The product documentation, if any, may also specify how to turn off updates for your specific device or software.<\/p>\n\n\n\n<p><strong>c)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/strong><strong>Germany and Austria.<\/strong><\/p>\n\n\n\n<p><strong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; i.&nbsp;Warranty.<\/strong>&nbsp;The properly licensed software will perform substantially as described in any Microsoft materials that accompany the Materials. However, Microsoft gives no contractual guarantee in relation to the licensed software.<\/p>\n\n\n\n<p><strong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ii.&nbsp;&nbsp;Limitation of Liability.<\/strong>&nbsp;In case of intentional conduct, gross negligence, claims based on the Product Liability Act, as well as, in case of death or personal or physical injury, Microsoft is liable according to the statutory law.<\/p>\n\n\n\n<p>Subject to the foregoing clause (ii), Microsoft will only be liable for slight negligence if Microsoft is in breach of such material contractual obligations, the fulfillment of which facilitate the due performance of this agreement, the breach of which would endanger the purpose of this agreement and the compliance with which a party may constantly trust in (so-called \u201ccardinal obligations\u201d). In other cases of slight negligence, Microsoft will not be liable for slight negligence.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>DISCLAIMER OF WARRANTY. THE MATERIALS ARE LICENSED \u201cAS IS.\u201d YOU BEAR THE RISK OF USING THEM. MICROSOFT GIVES NO EXPRESS WARRANTIES, GUARANTEES, OR CONDITIONS. TO THE EXTENT PERMITTED UNDER APPLICABLE LAWS, MICROSOFT EXCLUDES ALL IMPLIED WARRANTIES, INCLUDING MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT.<\/li>\n\n\n\n<li>LIMITATION ON AND EXCLUSION OF DAMAGES. IF YOU HAVE ANY BASIS FOR RECOVERING DAMAGES DESPITE THE PRECEDING DISCLAIMER OF WARRANTY, YOU CAN RECOVER FROM MICROSOFT AND ITS SUPPLIERS ONLY DIRECT DAMAGES UP TO U.S. $5.00. YOU CANNOT RECOVER ANY OTHER DAMAGES, INCLUDING CONSEQUENTIAL, LOST PROFITS, SPECIAL, INDIRECT OR INCIDENTAL DAMAGES.<\/li>\n<\/ol>\n\n\n\n<p>This limitation applies to (a) anything related to the Materials, services, content (including code) on third party Internet sites, or third party applications; and (b) claims for breach of contract, warranty, guarantee, or condition; strict liability, negligence, or other tort; or any other claim; in each case to the extent permitted by applicable law.<\/p>\n\n\n\n<p>It also applies even if Microsoft knew or should have known about the possibility of the damages. The above limitation or exclusion may not apply to you because your state, province, or country may not allow the exclusion or limitation of incidental, consequential, or other damages.<\/p>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n\n\n<h2 class=\"wp-block-heading\" id=\"motivation\">Motivation<\/h2>\n\n\n\n<p><strong>For what purpose was the dataset created? <\/strong>Was there a specific task in mind? Was there a specific gap that needed to be filled? Please provide a description.<\/p>\n\n\n\n<p>This dataset was created for two purposes: 1) to enable a small bilingual informational resource in both English and ASL, and 2) to provide continuous labelled sign language data for research. There is a severe shortage of such publicly available data, which is a primary barrier to research and technology advancement. More information about the bilingual resource can be found in a prior publication [1], and on the prototype website <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/aslgames.azurewebsites.net\/wiki\/\">https:\/\/aslgames.azurewebsites.net\/wiki\/<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<p>Additional background and all supplementary materials, including the bilingual website link, are available on the project page website <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/asl-stem-wiki\/\">https:\/\/www.microsoft.com\/en-us\/research\/project\/asl-stem-wiki\/<\/a>.<\/p>\n\n\n\n<p><strong>Who created this dataset (e.g., which team, research group) and on behalf of which entity (e.g., company, institution, organization)?<\/strong><\/p>\n\n\n\n<p>This dataset was created by Microsoft Research.<\/p>\n\n\n\n<p><strong>Who funded the creation of the dataset?<\/strong>&nbsp;If there is an associated grant, please provide the name of the grantor and the grant name and number.<\/p>\n\n\n\n<p>Microsoft funded the creation of the dataset.<\/p>\n\n\n\n<p><strong>Any other comments?<\/strong><\/p>\n\n\n\n<p>N\/A<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"composition\">Composition<\/h2>\n\n\n\n<p><strong>What do the instances that comprise the dataset represent (e.g., documents, photos, people, countries)?<\/strong>&nbsp;Are there multiple types of instances (e.g., movies, users, and ratings; people and interactions between them; nodes and edges)? Please provide a description.<\/p>\n\n\n\n<p>The dataset consists of continuous American Sign Language (ASL) videos, with associated metadata. The contents are interpretations of English Wikipedia articles related to STEM topics, recorded by professional ASL interpreters. Each recording corresponds to a single English sentence or section title in the original text. The videos are provided with the corresponding segmented English texts.<\/p>\n\n\n\n<p><strong>How many instances are there in total (of each type, if appropriate)?<\/strong><\/p>\n\n\n\n<p>The dataset consists of 64,266 videos, each corresponding to an English sentence or section title from a STEM-related Wikipedia article.<\/p>\n\n\n\n<p><strong>Does the dataset contain all possible instances or is it a sample (not necessarily random) of instances from a larger set?<\/strong>&nbsp;If the dataset is a sample, then what is the larger set? Is the sample representative of the larger set (e.g., geographic coverage)? If so, please describe how this representativeness was validated\/verified. If it is not representative of the larger set, please describe why not (e.g., to cover a more diverse range of instances, because instances were withheld or unavailable).<\/p>\n\n\n\n<p>The dataset is a sample. It contains a sample of English-to-ASL sentence-by-sentence translations of STEM-related articles from Wikipedia, and the videos contain a sample of 37 professional ASL interpreters.<\/p>\n\n\n\n<p><strong>What data does each instance consist of? \u201cRaw\u201d data (e.g., unprocessed text or images) or features? <\/strong>In either case, please provide a description.<\/p>\n\n\n\n<p>Each data point consists of a video. Each video contains a single professional ASL interpreter executing an ASL translation of an English sentence or section heading from a STEM-related Wikipedia article.<\/p>\n\n\n\n<p><strong>Is there a label or target associated with each instance?<\/strong>&nbsp;If so, please provide a description.<\/p>\n\n\n\n<p>Yes, each video corresponds to a portion of a Wikipedia article. We provide the corresponding text.<\/p>\n\n\n\n<p><strong>Is any information missing from individual instances?<\/strong>&nbsp;If so, please provide a description, explaining why this information is missing (e.g., because it was unavailable). This does not include intentionally removed information, but might include, e.g., redacted text.<\/p>\n\n\n\n<p>No.<\/p>\n\n\n\n<p><strong>Are relationships between individual instances made explicit (e.g., users\u2019 movie ratings, social network links)?<\/strong>&nbsp;If so, please describe how these relationships are made explicit.<\/p>\n\n\n\n<p>Yes, videos are related to one another, in that they correspond to sequential sentences or titles in Wikipedia articles. We provide this sequential information in the text metadata.<\/p>\n\n\n\n<p><strong>Are there recommended data splits (e.g., training, development\/validation, testing)?<\/strong>&nbsp;If so, please provide a description of these splits, explaining the rationale behind them.<\/p>\n\n\n\n<p>Are there recommended data splits (e.g., training, development\/validation, testing)?}{If so, please provide a description of these splits, explaining the rationale behind them.}<\/p>\n\n\n\n<p>In our paper accompanying the dataset release, we used the following splits:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>training: all recordings of the five control articles (Acid catalysis, EDGE species, Hal Anger (person), Relativistic electromagnetism, Standard score), which provided multiple recordings per sentence<\/li>\n\n\n\n<li>test: all remaining articles, which provided a single recording per sentence<\/li>\n<\/ul>\n\n\n\n<p>Dataset users can consider splitting the dataset by article or interpreter, to help preserve the independence of the test set.<\/p>\n\n\n\n<p><strong>Are there any errors, sources of noise, or redundancies in the dataset?<\/strong> If so, please provide a description.<\/p>\n\n\n\n<p>Though the translations were made by professional ASL interpreters, the translations are still human-generated, and may contain errors. Because the translations were made from English to ASL, the former language likely influences the secondary language (e.g. in grammatical structures). The English text was also segmented into sentence units, forcing the translation to occur sentence-by-sentence, which further constrained the flexibility and naturalness of the ASL. Additionally, because the content is STEM-related and sometimes technical, fingerspelling is often used to represent concepts where signs do not exist or may have been unfamiliar to the interpreter.<\/p>\n\n\n\n<p>The dataset is also missing occasional sentences from the original Wikipedia articles. While the ASL interpreters were asked to record translations of entire Wikipedia articles, occasionally portions were skipped, and some additional videos were removed during cleaning. To validate the data, the research team manually reviewed a random sample of videos from each contributor, ran scripts to check for invalid recordings, and manually examined outliers. Removed videos were: 110 with webcam failure, 500 corrupted, 19 with large discrepancies in text and recording length (video < 3s, text \u22655 words), 1 large outlier (>1000s), and 7 shorter than 1s.<\/p>\n\n\n\n<p><strong>Is the dataset self-contained, or does it link to or otherwise rely on external resources (e.g., websites, tweets, other datasets)?<\/strong>&nbsp;If it links to or relies on external resources, a) are there guarantees that they will exist, and remain constant, over time; b) are there official archival versions of the complete dataset (i.e., including the external resources as they existed at the time the dataset was created); c) are there any restrictions (e.g., licenses, fees) associated with any of the external resources that might apply to a future user? Please provide descriptions of all external resources and any restrictions associated with them, as well as links or other access points, as appropriate.<\/p>\n\n\n\n<p>The videos correspond to English Wikipedia articles. We provide the mapping between ASL videos and English text. The Wikipedia text has been published under a Creative Contents license, and is available for public download (<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Database_download\">https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Database_download<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, July 2020).<\/p>\n\n\n\n<p><strong>Does the dataset contain data that might be considered confidential (e.g., data that is protected by legal privilege or by doctor-patient confidentiality, data that includes the content of individuals non-public communications)?<\/strong>&nbsp;If so, please provide a description.<\/p>\n\n\n\n<p>No, the data is not confidential. All participants consented to providing recordings and public release of the dataset.<\/p>\n\n\n\n<p><strong>Does the dataset contain data that, if viewed directly, might be offensive, insulting, threatening, or might otherwise cause anxiety?<\/strong>&nbsp;If so, please describe why.<\/p>\n\n\n\n<p>No, we do not expect contents to be offensive. The topic is STEM.<\/p>\n\n\n\n<p><strong>Does the dataset relate to people?<\/strong>&nbsp;If not, you may skip the remaining questions in this section.<\/p>\n\n\n\n<p>Yes, the videos are of sign language interpreters.<\/p>\n\n\n\n<p><strong>Does the dataset identify any subpopulations (e.g., by age, gender)?<\/strong>&nbsp;If so, please describe how these subpopulations are identified and provide a description of their respective distributions within the dataset.<\/p>\n\n\n\n<p>The people in the videos are professional ASL interpreters.<\/p>\n\n\n\n<p><strong>Is it possible to identify individuals (i.e., one or more natural persons), either directly or indirectly (i.e., in combination with other data) from the dataset?<\/strong>&nbsp;If so, please describe how.<\/p>\n\n\n\n<p>Yes, it is possible to identify individuals in the dataset, since the interpreters&#8217; faces and upper bodies are captured in the videos.<\/p>\n\n\n\n<p><strong>Does the dataset contain data that might be considered sensitive in any way (e.g., data that reveals racial or ethnic origins, sexual orientations, religious beliefs, political opinions or union memberships, or locations; financial or health data; biometric or genetic data; forms of government identification, such as social security numbers; criminal history)?<\/strong>&nbsp;If so, please provide a description.<\/p>\n\n\n\n<p>The data may be considered sensitive, in that the interpreters&#8217; faces and upper bodies are captured in the videos.<\/p>\n\n\n\n<p><strong>Any other comments?<\/strong><\/p>\n\n\n\n<p>N\/A<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"collection-process\">Collection Process<\/h2>\n\n\n\n<p><strong>How was the data associated with each instance acquired?<\/strong>&nbsp;Was the data directly observable (e.g., raw text, movie ratings), reported by subjects (e.g., survey responses), or indirectly inferred\/derived from other data (e.g., part-of-speech tags, model-based guesses for age or language)? If data was reported by subjects or indirectly inferred\/derived from other data, was the data validated\/verified? If so, please describe how.<\/p>\n\n\n\n<p>Professional interpreters recorded each video, which was automatically linked to the corresponding English text. This process was facilitated by a web platform that the research team created for this collection (described subsequently).<\/p>\n\n\n\n<p><strong>What mechanisms or procedures were used to collect the data (e.g., hardware apparatus or sensor, manual human curation, software program, software API)?<\/strong>&nbsp;How were these mechanisms or procedures validated?<\/p>\n\n\n\n<p>The data was collected through a website created explicitly for this dataset collection. The platform provides a full English text on the right side of the screen. The interpreters have full access to the text, and can progress through the text and record themselves providing a translation in the website. A record button is available, which triggers recording through their computer&#8217;s webcam. The interface also supported playing back recordings and re-recording. The resulting translations are available for content consumers to view, for example to enable access to spot-translations to improve article accessibility to people whose primary language is ASL. The full platform design is described in [1].<\/p>\n\n\n\n<p><strong>If the dataset is a sample from a larger set, what was the sampling strategy (e.g., deterministic, probabilistic with specific sampling probabilities)?<\/strong><\/p>\n\n\n\n<p>N\/A<\/p>\n\n\n\n<p><strong>Who was involved in the data collection process (e.g., students, crowdworkers, contractors) and how were they compensated (e.g., how much were crowdworkers paid)?<\/strong><\/p>\n\n\n\n<p>Microsoft Research designed and built the collection platform. Professional interpreters provided recorded translations through the web platform. The interpreters were paid at standard hourly ASL interpreter rates for virtual interpretation jobs.<\/p>\n\n\n\n<p><strong>Over what timeframe was the data collected? Does this timeframe match the creation timeframe of the data associated with the instances (e.g., recent crawl of old news articles)?<\/strong>&nbsp;If not, please describe the timeframe in which the data associated with the instances was created.<\/p>\n\n\n\n<p>The videos were collected between March and June 2022.<\/p>\n\n\n\n<p><strong>Were any ethical review processes conducted (e.g., by an institutional review board)?<\/strong>&nbsp;If so, please provide a description of these review processes, including the outcomes, as well as a link or other access point to any supporting documentation.<\/p>\n\n\n\n<p>Yes, the project was reviewed by Microsoft&#8217;s Institutional Review Board (IRB). The collection platform and the dataset release also underwent additional Ethics and Compliance reviews by Microsoft.<\/p>\n\n\n\n<p><strong>Does the dataset relate to people?<\/strong>&nbsp;If not, you may skip the remaining questions in this section.<\/p>\n\n\n\n<p>Yes, the dataset consists of videos of human ASL interpreters.<\/p>\n\n\n\n<p><strong>Did you collect the data from the individuals in question directly, or obtain it via third parties or other sources (e.g., websites)?<\/strong><\/p>\n\n\n\n<p>The videos were recorded by the ASL interpreters.<\/p>\n\n\n\n<p><strong>Were the individuals in question notified about the data collection?<\/strong>&nbsp;If so, please describe (or show with screenshots or other information) how notice was provided, and provide a link or other access point to, or otherwise reproduce, the exact language of the notification itself.<\/p>\n\n\n\n<p>Yes, the interpreters controlled the recording process (e.g. start, stop, re-recording). A screenshot of the recording interface is provided in Figure 3 of [1]. Contributors also engaged in a consent process prior to contributing any data.<\/p>\n\n\n\n<p><strong>Did the individuals in question consent to the collection and use of their data?<\/strong>&nbsp;If so, please describe (or show with screenshots or other information) how consent was requested and provided, and provide a link or other access point to, or otherwise reproduce, the exact language to which the individuals consented.<\/p>\n\n\n\n<p>Yes, participants engaged in a consent process through the web platform prior to contributing. For the exact consent text, please visit <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/asl-stem-wiki\/\">https:\/\/www.microsoft.com\/en-us\/research\/project\/asl-stem-wiki\/<\/a>.<\/p>\n\n\n\n<p><strong>If consent was obtained, were the consenting individuals provided with a mechanism to revoke their consent in the future or for certain uses?<\/strong>&nbsp;If so, please provide a description, as well as a link or other access point to the mechanism (if appropriate).<\/p>\n\n\n\n<p>Yes, participants could contact the research team directly, and could delete any of their recordings through the web platform.<\/p>\n\n\n\n<p><strong>Has an analysis of the potential impact of the dataset and its use on data subjects (e.g., a data protection impact analysis) been conducted?<\/strong>&nbsp;If so, please provide a description of this analysis, including the outcomes, as well as a link or other access point to any supporting documentation.<\/p>\n\n\n\n<p>Yes, a Data Protection Impact Analysis (DPIA) has been conducted, including taking a detailed inventory of the data types collected and stored and retention policy, and was successfully reviewed by Microsoft.<\/p>\n\n\n\n<p><strong>Any other comments?<\/strong><\/p>\n\n\n\n<p>N\/A<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"preprocessing-cleaning-labelling\">Preprocessing\/cleaning\/labelling<\/h2>\n\n\n\n<p><strong>Was any preprocessing\/cleaning\/labeling of the data done (e.g., discretization or bucketing, tokenization, part-of-speech tagging, SIFT feature extraction, removal of instances, processing of missing values)?<\/strong>&nbsp;If so, please provide a description. If not, you may skip the remainder of the questions in this section.<\/p>\n\n\n\n<p>To validate the data, the research team manually reviewed a random sample of videos from each contributor, ran scripts to check for invalid recordings, and manually examined outliers. Removed videos were: 110 with webcam failure, 500 corrupted, 19 with large discrepancies in text and recording length (video < 3s, text \u22655 words), 1 large outlier (>1000s), and 7 shorter than 1s.<\/p>\n\n\n\n<p><strong>Was the \u201craw\u201d data saved in addition to the preprocessed\/cleaned\/labeled data (e.g., to support unanticipated future uses)?<\/strong>&nbsp;If so, please provide a link or other access point to the \u201craw\u201d data.<\/p>\n\n\n\n<p>No, not publicly.<\/p>\n\n\n\n<p><strong>Is the software used to preprocess\/clean\/label the instances available?<\/strong>&nbsp;If so, please provide a link or other access point.<\/p>\n\n\n\n<p>No, but these procedures are easily reproducible using public software.<\/p>\n\n\n\n<p><strong>Any other comments?<\/strong><\/p>\n\n\n\n<p>N\/A<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"uses\">Uses<\/h2>\n\n\n\n<p><strong>Has the dataset been used for any tasks already?<\/strong>&nbsp;If so, please provide a description.<\/p>\n\n\n\n<p>Yes, we provide fingerspelling detection and alignment baselines in the accompanying paper publication.<\/p>\n\n\n\n<p><strong>Is there a repository that links to any or all papers or systems that use the dataset?<\/strong>&nbsp;If so, please provide a link or other access point.<\/p>\n\n\n\n<p>Yes, the link is available on our project page at <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/asl-stem-wiki\/\">https:\/\/www.microsoft.com\/en-us\/research\/project\/asl-stem-wiki\/<\/a>.<\/p>\n\n\n\n<p><strong>What (other) tasks could the dataset be used for?<\/strong><\/p>\n\n\n\n<p>The dataset can be used for a range of tasks, including:<\/p>\n\n\n\n<p>Fingerspelling detection and recognition &#8211; Our dataset reflects an increased usage of fingerspelling in interpretations of STEM documents. Identifying instances of fingerspelling and mapping these instances onto the corresponding English words can help researchers better understand signing patterns. We provide benchmarks for fingerspelling detection and recognition in our paper accompanying the dataset release. The ability to detect and recognize fingerspelling can also enable richer downstream applications, such as automatic sign suggestion (described subsequently).<\/p>\n\n\n\n<p>Automatic sign suggestion &#8211; Also motivated by the prevalence of fingerspelling in STEM interpretations, we suggest developing systems to detect when fingerspelling is used and suggest appropriate ASL signs to use instead. Suggestions would be dependent on the domain and context (e.g. &#8220;protein&#8221; in the context of nutrition, structural biology, or protein engineering may have distinct ASL signs), as well as on the audience (e.g. the sign to use for an elementary school class may be different from the sign to use with a college audience). Fingerspelling may be appropriate in some cases as well, for example when introducing a new sign that is not well-known.<\/p>\n\n\n\n<p>Translationese\/Interpretese &#8211; Because our dataset is prompted from an English source sentence, it is prone to having effects of translationese [2], such as English-influenced word order, segmentation of ASL into English sentence boundaries, signs for English homonyms being used instead of the appropriate sign, and increased fingerspelling. We propose training models to detect and repair translationese, as well as potential translation and interpretation studies around interpretese [3] of ASL.<\/p>\n\n\n\n<p>Sign variation &#8211; Five of our articles are interpreted by all 37 ASL interpreters in our study. These articles provide a unique opportunity to study variations in how individuals sign and interpret the same English sentence, especially STEM concepts where ASL signs are not stabilized.<\/p>\n\n\n\n<p>Sign linking\/retrieval &#8211; Related to sign variation, our dataset contains examples of English words that may be interpreted differently across interpreters and context. This data can be used to train models that links different versions of ASL signs for the same concept (e.g. one interpreter may sign &#8220;electromagnetism&#8221; using the signs for ELECTRICITY and MAGNET, another interpreter may interpret the same word using a sign that visually describes an electromagnetic field).<\/p>\n\n\n\n<p>Automatic STEM translation &#8211; Our dataset can be used to train, fine-tune, and\/or evaluate model capabilities in translating technical content from English to ASL. Technically, our dataset could be used to develop models to translate from ASL to English, however, this direction is not preferred since our dataset contains interpreted ASL which may differ from unprompted ASL [3].<\/p>\n\n\n\n<p><strong>Is there anything about the composition of the dataset or the way it was collected and preprocessed\/cleaned\/labeled that might impact future uses?<\/strong>&nbsp;For example, is there anything that a future user might need to know to avoid uses that could result in unfair treatment of individuals or groups (e.g., stereotyping, quality of service issues) or other undesirable harms (e.g., financial harms, legal risks) If so, please provide a description. Is there anything a future user could do to mitigate these undesirable harms?<\/p>\n\n\n\n<p>Dataset users should be aware that the dataset does not consist of natural ASL content. Because the translations were made from English to ASL, the former language likely influences the secondary language (e.g. in grammatical structures). The English text was also segmented into sentence units, forcing the translation to occur sentence-by-sentence, which further constrained the flexibility and naturalness of the ASL. Additionally, because the content is STEM-related and sometimes technical, fingerspelling is often used to represent concepts where signs do not exist or may have been unfamiliar to the interpreter.<\/p>\n\n\n\n<p>It is possible that some of these limitations may be reduced in the future by post-processing or correcting the videos, by combining this dataset with other datasets that do consist of natural ASL-first contents, or by incorporating linguistic knowledge in modeling. Involving fluent ASL team members in key roles in projects can help mediate risks, for example to provide feedback on signing quality, community needs and perspectives, and other relevant guidance and information.<\/p>\n\n\n\n<p><strong>Are there tasks for which the dataset should not be used?<\/strong>&nbsp;If so, please provide a description.<\/p>\n\n\n\n<p>As described above, this dataset is not an example of natural ASL-first signing. As a result, we do not recommend using this dataset alone to understand or model natural ASL-first signing. At a minimum, this dataset would need to be used in conjunction with other datasets and\/or domain knowledge about sign language in order to more accurately model naturalistic ASL.<\/p>\n\n\n\n<p>More generally, we recommend using this data with meaningful involvement from Deaf community members in leadership roles with decision-making authority at every step from conception to execution. As described in other works, research and development of sign language technologies that involves Deaf community members increases the quality of the work, and can help to ensure technologies are relevant and wanted. Historically, projects developed without meaningful Deaf involvement have not been well received [4] and have damaged relationships between technologists and deaf communities.<\/p>\n\n\n\n<p>We ask that this dataset is used with an aim of making the world more equitable and just for deaf people, and with a commitment to &#8220;do no harm&#8221;. In that spirit, this dataset should not be used to develop technology that purports to replace sign language interpreters, fluent signing educators, and\/or other hard-won accommodations for deaf people.<\/p>\n\n\n\n<p><strong>Any other comments?<\/strong><\/p>\n\n\n\n<p>N\/A<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"distribution\">Distribution<\/h2>\n\n\n\n<p><strong>Will the dataset be distributed to third parties outside of the entity (e.g., company, institution, organization) on behalf of which the dataset was created?<\/strong>&nbsp;If so, please provide a description.<\/p>\n\n\n\n<p>Yes, the dataset is released publicly, to help advance research and development related to sign language.<\/p>\n\n\n\n<p><strong>How will the dataset will be distributed (e.g., tarball on website, API, GitHub)?<\/strong>&nbsp;Does the dataset have a digital object identifier (DOI)?<\/p>\n\n\n\n<p>The dataset is publicly available for download through the Microsoft Download Center. Links are available through the project page at <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/asl-stem-wiki\/\">https:\/\/www.microsoft.com\/en-us\/research\/project\/asl-stem-wiki\/<\/a>.<\/p>\n\n\n\n<p><strong>When will the dataset be distributed?<\/strong><\/p>\n\n\n\n<p>The dataset was released on 10\/15\/2024.<\/p>\n\n\n\n<p><strong>Will the dataset be distributed under a copyright or other intellectual property (IP) license, and\/or under applicable terms of use (ToU)?<\/strong>&nbsp;If so, please describe this license and\/or ToU, and provide a link or other access point to, or otherwise reproduce, any relevant licensing terms or ToU, as well as any fees associated with these restrictions.<\/p>\n\n\n\n<p>Yes, the dataset will be published under a license that permits use for research purposes. The license is provided at <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/asl-stem-wiki\/dataset-license\/\">https:\/\/www.microsoft.com\/en-us\/research\/project\/asl-stem-wiki\/dataset-license\/<\/a>.<\/p>\n\n\n\n<p><strong>Have any third parties imposed IP-based or other restrictions on the data associated with the instances?<\/strong>&nbsp;If so, please describe these restrictions, and provide a link or other access point to, or otherwise reproduce, any relevant licensing terms, as well as any fees associated with these restrictions.<\/p>\n\n\n\n<p>The Wikipedia article texts are multi-licensed under the Creative Commons Attribution-ShareAlike 3.0 License (CC-BY-SA) and the GNU Free Documentation License (GFDL).<\/p>\n\n\n\n<p><strong>Do any export controls or other regulatory restrictions apply to the dataset or to individual instances?<\/strong>&nbsp;If so, please describe these restrictions, and provide a link or other access point to, or otherwise reproduce, any supporting documentation.<\/p>\n\n\n\n<p>No.<\/p>\n\n\n\n<p><strong>Any other comments?<\/strong><\/p>\n\n\n\n<p>N\/A<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"maintenance\">Maintenance<\/h2>\n\n\n\n<p><strong>Who will be supporting\/hosting\/maintaining the dataset?<\/strong><\/p>\n\n\n\n<p>The dataset will be hosted on Microsoft Download Center.<\/p>\n\n\n\n<p><strong>How can the owner\/curator\/manager of the dataset be contacted (e.g., email address)?<\/strong><\/p>\n\n\n\n<p>Please contact <a href=\"mailto:ASL_Citizen@microsoft.com\">ASL_Citizen@microsoft.com<\/a> with any questions.<\/p>\n\n\n\n<p><strong>Is there an erratum?<\/strong>&nbsp;If so, please provide a link or other access point.<\/p>\n\n\n\n<p>A public-facing website is associated with the dataset (see <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/asl-stem-wiki\/\">https:\/\/www.microsoft.com\/en-us\/research\/project\/asl-stem-wiki\/<\/a>). We will link to erratum on this website if necessary.<\/p>\n\n\n\n<p><strong>Will the dataset be updated (e.g., to correct labeling errors, add new instances, delete instances)?<\/strong>&nbsp;If so, please describe how often, by whom, and how updates will be communicated to users (e.g., mailing list, GitHub)?<\/p>\n\n\n\n<p>If updates are necessary, we will update the dataset. We will release our dataset with a version number, to distinguish it with any future updated versions.<\/p>\n\n\n\n<p><strong>If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances (e.g., were individuals in question told that their data would be retained for a fixed period of time and then deleted)?<\/strong>&nbsp;If so, please describe these limits and explain how they will be enforced.<\/p>\n\n\n\n<p>The dataset will be left up indefinitely, to maximize utility to research. Participants were informed that their contributions might be released in a public dataset.<\/p>\n\n\n\n<p><strong>Will older versions of the dataset continue to be supported\/hosted\/maintained?<\/strong>&nbsp;If so, please describe how. If not, please describe how its obsolescence will be communicated to users.<\/p>\n\n\n\n<p>All versions of the dataset will be released with a version number on Microsoft Download Center to enable differentiation.<\/p>\n\n\n\n<p><strong>If others want to extend\/augment\/build on\/contribute to the dataset, is there a mechanism for them to do so?<\/strong>&nbsp;If so, please provide a description. Will these contributions be validated\/verified? If so, please describe how. If not, why not? Is there a process for communicating\/distributing these contributions to other users? If so, please provide a description.<\/p>\n\n\n\n<p>We do not have a mechanism for others to contribute to our dataset directly. However, others could create comparable datasets by recording ASL translations of English texts.<\/p>\n\n\n\n<p><strong>Any other comments?<\/strong><\/p>\n\n\n\n<p>N \/A<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"references\">References<\/h2>\n\n\n\n<p>[1] Abraham Glasser, Fyodor Minakov, and Danielle Bragg. ASL Wiki: An Exploratory Interface for Crowdsourcing ASL Translations. In Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility, pages 1\u201313, 2022.<\/p>\n\n\n\n<p>[2] Moshe Koppel and Noam Ordan. Translationese and Its Dialects. In Proceedings of the 49th annual meeting of the Association for Computational Linguistics: Human Language Technologies, pages 1318\u20131326, 2011.<\/p>\n\n\n\n<p>[3] Miriam Shlesinger. Towards a Definition of <em>Interpretese:<\/em> An intermodal, corpus-based study. Efforts and Models in Interpreting and Translation Research. A Tribute to Daniel Gile. Amsterdam\/Philadelphia: John Benjamins, pages 237\u2013253, 2009.<\/p>\n\n\n\n<p>[4] Michael Erard. Why Sign-Language Gloves Don\u2019t Help Deaf People. 2017.<\/p>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n\n\n<h2 class=\"wp-block-heading\" id=\"microsoft-research-project-participation-consent-form\">Microsoft Research Project Participation Consent Form<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"introduction\">INTRODUCTION<\/h3>\n\n\n\n<p>Thank you for deciding to volunteer in a Microsoft Corporation research project.&nbsp; You have no obligation to participate and you may decide to terminate your participation at any time.&nbsp; You also understand that the researcher has the right to withdraw you from participation in the project at any time. Below is a description of the research project, and your consent to participate.&nbsp; Read this information carefully. If you agree to participate, sign in the space provided.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"title-of-research-project\">TITLE OF RESEARCH PROJECT<\/h3>\n\n\n\n<p>ASL Dataset Community<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"principal-investigator\">Principal Investigator<\/h4>\n\n\n\n<p>Danielle Bragg<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"purpose\">PURPOSE<\/h3>\n\n\n\n<p>The purpose of this project is to collect sign language videos from volunteer contributors to advance sign language recognition, while fostering American Sign Language (ASL) community online. The website falls under the category of &#8220;citizen science&#8221;, where people contribute for the purpose of advancing science or research. Contributors will be able to do three things: 1) record videos of themselves executing specific signs, 2) validate that other contributors executed signs correctly, and 3) explore the communal dataset.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"procedures\">PROCEDURES<\/h3>\n\n\n\n<p>During this project, the following will happen:&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You will create a user profile, including a username, email address, and optional picture and demographics.&nbsp;<\/li>\n\n\n\n<li>You will then be able to do three main things on the website: 1) record videos of yourself signing, 2) validate that other contributors executed signs correctly, and 3) explore the communal dataset. Microsoft may document and collect information about your participation by storing your profile information, the videos you submit, your ratings of other contributors\u2019 videos, and any other interactions with the site.&nbsp;<\/li>\n<\/ul>\n\n\n\n<p>Approximately 200 participants will be involved in this study.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"personal-information-and-confidentiality\">PERSONAL INFORMATION AND CONFIDENTIALITY&nbsp;<\/h3>\n\n\n\n<p>Microsoft Research is ultimately responsible for determining the purposes and uses of your personal information.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Personal information we collect.&nbsp;&nbsp; <\/strong>During the project we may collect personal information about you such as image, likeness, email, age, gender, and ASL experience.&nbsp;<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>How we use personal information.<\/strong>&nbsp; The personal information and other data collected during this project will be used primarily to perform research for purposes described in the introduction above.&nbsp;&nbsp; Such information and data, or the results of the research may eventually be used to develop and improve our commercial products, services or technologies.&nbsp;&nbsp;<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Right of Publicity<\/strong>.&nbsp; By submitting video(s) of yourself, you confirm that you are the depicted person in the video(s) you submit and you grant Microsoft an unrestricted, perpetual, worldwide, royalty-free, irrevocable license, with rights to assign and sublicense, to use your image and likeness for the research project above and in any related services, on a worldwide basis.<\/li>\n\n\n\n<li><strong>How we store and share your personal information.&nbsp; <\/strong>Your personal data will be stored for a period of up to 5 years from your last login.&nbsp; This project is a collaboration with Boston University, who will have access to collected data. In addition, we may release a dataset that includes videos and other demographics publicly to help advance research.&nbsp;&nbsp;<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>How you can access and control your personal information.<\/strong>&nbsp; If you wish to review or copy any personal information you provided during the study, log in to your account to view, edit or delete data from the live site. If you have any additional questions, please email the research team at: aslgames@microsoft.com.&nbsp; Please note that we will not be able to delete data that has already been shared publicly in a research dataset release. We will respond to questions or concerns within 30 days.&nbsp;<\/li>\n<\/ul>\n\n\n\n<p>For additional information on how Microsoft handles your personal information, please see the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/go.microsoft.com\/fwlink\/?LinkId=521839\" target=\"_blank\" rel=\"noopener noreferrer\">Microsoft Privacy Statement<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"research-results-feedback\">RESEARCH RESULTS & FEEDBACK&nbsp;<\/h3>\n\n\n\n<p>Microsoft will own all of the research data and analysis and other results (collectively \u201cResearch Results\u201d) generated from the information you provide and your participation in the research project. You may also provide suggestions, comments or other feedback (\u201cFeedback\u201d) to Microsoft with respect to the research project. Feedback is entirely voluntary, and Microsoft shall be free to use, disclose, reproduce, license, or otherwise distribute, and leverage the Feedback and Research Results.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"microsoft-and-confidentiality\">MICROSOFT AND CONFIDENTIALITY&nbsp;<\/h3>\n\n\n\n<p>The research project and information you learn by participating in the project is confidential to Microsoft.&nbsp; Sharing this confidential information with people other than those we\u2019ve identified above could negatively affect the scientific integrity of the research study and could even make it more difficult for Microsoft to develop new products based on the information obtained in this study. It is therefore important that you do not talk about the project outside of the study team (unless you are legally required to do so by a court or other government order).&nbsp; This does not apply if the information is general public knowledge or if you have a legal right to share the information.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"benefits-and-risks\">BENEFITS AND RISKS&nbsp;<\/h3>\n\n\n\n<p><strong>Benefits:\u202f \u202f\u202f<\/strong>The research team expects to collect videos of diverse signers from this project which we hope will improve the accuracy of sign language recognition systems for diverse signers, for example enabling the creation of drive-through. You will receive any public benefit that may come of these Research Results being shared with the greater scientific community.&nbsp;&nbsp;<\/p>\n\n\n\n<p><strong>Risks:<\/strong>&nbsp;&nbsp;&nbsp;&nbsp; During participation, you may experience discomfort at contributing videos of yourself. Because you will be able to view videos recorded by other contributors, it is also possible that you will view inappropriate or offensive content. To alleviate this risk, the website allows participants to flag inappropriate content, which will be reviewed by a moderator.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"future-use-of-your-identifiable-information\">FUTURE USE OF YOUR IDENTIFIABLE INFORMATION&nbsp;&nbsp;<\/h3>\n\n\n\n<p>Identifiers might be removed from your identifiable private information, and after such removal, the information could be used for future research studies or distributed to another investigator for future research studies without your (or your legally authorized representative\u2019s) additional informed consent,&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"payment-for-participation\">PAYMENT FOR PARTICIPATION&nbsp;<\/h3>\n\n\n\n<p>You will not be paid to take part in this study.&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<p>Your data may be used to make new products, tests or findings.&nbsp; These may have value and may be developed and owned by Microsoft and\/or others.&nbsp; If this happens, there are no plans to pay you.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"participation\">PARTICIPATION&nbsp;&nbsp;<\/h3>\n\n\n\n<p>Taking part in research is always a choice. If you decide to be in the study, you can change your mind at any time without affecting any rights including payment to which you would otherwise be entitled. If you decide to withdraw, you should contact the person in charge of this study, and also inform that person if you would like your personal information removed as well.&nbsp;&nbsp;<\/p>\n\n\n\n<p>Microsoft or the person in charge of this study may discontinue the study or your individual participation in the study at any time without your consent for reasons including:&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>your failure to follow directions&nbsp;<\/li>\n\n\n\n<li>it is discovered that you do not meet study requirements&nbsp;&nbsp;<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li>it is in your best interest medically&nbsp;&nbsp;<\/li>\n\n\n\n<li>the study is canceled&nbsp;<\/li>\n\n\n\n<li>administrative reasons&nbsp;&nbsp;<\/li>\n<\/ul>\n\n\n\n<p>If you leave the study, the study staff will still be able to use your information that they have already collected, however, you have the right to ask for it to be removed when you leave.&nbsp;&nbsp;<\/p>\n\n\n\n<p>Significant new findings that develop during the course of this study that might impact your willingness to be in this study will be given to you.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"contact-information\">CONTACT INFORMATION&nbsp;&nbsp;<\/h3>\n\n\n\n<p>Should you have any questions concerning this project, please contact the research team at aslgames@microsoft.com.&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<p>Should you have any questions about your rights as a research subject, please contact Microsoft Research Ethics Program Feedback at MSRStudyfeedback@microsoft.com.&nbsp;&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"consent\">CONSENT&nbsp;<\/h3>\n\n\n\n<p>By clicking CONTINUE, you confirm that the study was explained to you, you had a chance to ask questions before beginning the study, and all your questions were answered satisfactorily. At any time, you may ask other questions. By clicking CONTINUE, you voluntarily consent to participate, and you do not give up any legal rights you have as a study participant.&nbsp;&nbsp;<\/p>\n\n\n\n<p>Please confirm your consent by clicking CONTINUE. If you wish, you may now save a copy of this consent form for future reference. On behalf of Microsoft, we thank you for your contribution and look forward to your research session.<\/p>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n","protected":false},"excerpt":{"rendered":"<p>Dataset and Benchmark for Interpreting STEM Articles<\/p>\n","protected":false},"featured_media":1085325,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556,13562,13563,13551,13554],"msr-locale":[268875],"msr-impact-theme":[261667],"msr-pillar":[],"class_list":["post-1005396","msr-project","type-msr-project","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-research-area-computer-vision","msr-research-area-data-platform-analytics","msr-research-area-graphics-and-multimedia","msr-research-area-human-computer-interaction","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"","related-publications":[1105446],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Alex Lu","user_id":41036,"people_section":"Related people","alias":"lualex"},{"type":"user_nicename","display_name":"Chinmay Singh","user_id":36750,"people_section":"Related people","alias":"chsingh"}],"msr_research_lab":[199563,199571],"msr_impact_theme":["Empowerment"],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1005396","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":60,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1005396\/revisions"}],"predecessor-version":[{"id":1137917,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1005396\/revisions\/1137917"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1085325"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1005396"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1005396"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1005396"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1005396"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1005396"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}