{"id":661779,"date":"2020-05-21T18:07:35","date_gmt":"2020-05-22T01:07:35","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&#038;p=661779"},"modified":"2020-05-21T18:07:35","modified_gmt":"2020-05-22T01:07:35","slug":"rationalizing-semantic-and-keyword-search-on-microsoft-academic-2","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/rationalizing-semantic-and-keyword-search-on-microsoft-academic-2\/","title":{"rendered":"Rationalizing Semantic and Keyword Search on Microsoft Academic"},"content":{"rendered":"<p>Over the past 6 months we&#8217;ve been experimenting with a host of changes to Microsoft Academic&#8217;s search experience, and now that the last of those experiments has shipped we&#8217;re excited to finally discuss them.<\/p>\n<p>Before we jump in, if you&#8217;re interested in a deeper technical analysis of the new capabilities please review the following resources:<\/p>\n<ul>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdata.2019.00045\/full\">A Review of Microsoft Academic Services for Science of Science Studies<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br \/>\nA comprehensive scientific review of the different technologies and algorithms used to create Microsoft Academic Services, which power Microsoft Academic<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/docs.microsoft.com\/en-us\/academic-services\/knowledge-exploration-service\/\">Microsoft Academic Knowledge Exploration Service (MAKES) technical documentation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br \/>\nMAKES is a version of both the search API and search index behind Microsoft Academic, freely available to use in Azure<\/li>\n<\/ul>\n<h2>No room for interpretation?<\/h2>\n<p>From the initial release of Microsoft Academic in 2016, up until 6 months ago, our semantic search algorithm focused on generating results that best matched semantically coherent interpretations of user queries, informed by the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/docs.microsoft.com\/en-us\/academic-services\/graph\/\">Microsoft Academic Graph (MAG)<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<p>To better explain, let&#8217;s examine the query \u201ccovid-19 science\u201d. Traditional search engines based on keyword search (i.e. Google Scholar, Semantic Scholar, Lens.org, etc.) do an excellent job of retrieving relevant results that have keyword matches for &#8220;covid-19&#8221; and variations of &#8220;science&#8221; (science, sciences, scientific, etc.) Our system, however, prefers to interpret \u201ccovid-19\u201d as a shorthand reference (synonym) of the topic <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/academic.microsoft.com\/topic\/3008058167\">&#8220;Coronavirus disease 2019 (COVID-19)&#8221;<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and \u201cscience\u201d as the journal <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/academic.microsoft.com\/journal\/3880285\">&#8220;Science&#8221;<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> because MAG suggests this interpretation will turn up more highly cited and relevant papers than treating the query as simple paper full-text (title\/abstract\/body) keywords. This distinction is important, as it allows our semantic search algorithm to leverage semantic inference to retrieve seminal publications that do not strictly contain &#8220;covid-19&#8221; as keywords, yet are nevertheless relevant and important.<\/p>\n<p>Regardless, we still previously allowed for rudimentary keyword matching, namely, prefix and literal unigram matching of publication titles (with no support for stemming or spelling corrections). Unfortunately, the outcome of this limited keyword matching was frequently encounters with the dreaded &#8220;no results&#8221; page.<\/p>\n<p>For example, assume you were looking for a paper that you <em>thought<\/em> was named &#8220;heterogeneous network embeddings via deep architectures&#8221;. Entering this phrase as a query would result in no suggestions and an error page if executed on the site:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-651771 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/NoResultsFound.png\" alt=\"No search results\" width=\"1064\" height=\"45\" data-nosnippet=\"\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/NoResultsFound.png 1064w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/NoResultsFound-300x13.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/NoResultsFound-1024x43.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/NoResultsFound-768x32.png 768w\" sizes=\"auto, (max-width: 1064px) 100vw, 1064px\" \/><\/p>\n<p>This is a classic case of users knowing what they want but having difficulty getting an algorithm to understand. A common problem with keyword search is it puts the burden of choosing the \u201cright\u201d keywords for a query squarely on the shoulder of the user.<\/p>\n<p>Now with our newest search implementation this same query will work exactly as intended:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-652188 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/MisspelledWord.png\" alt=\"Paper search result with dropped term\" width=\"896\" height=\"261\" data-nosnippet=\"\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/MisspelledWord.png 896w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/MisspelledWord-300x87.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/MisspelledWord-768x224.png 768w\" sizes=\"auto, (max-width: 896px) 100vw, 896px\" \/><\/p>\n<p>To understand why this now works we first need to explain how our semantic search implementation works.<\/p>\n<h2>Ok, maybe a <em>little<\/em> room for interpretation<\/h2>\n<p>To put it simply, we&#8217;ve changed our semantic search implementation from a strict form where <span style=\"text-decoration: underline\">all terms must be understood<\/span> to a looser form where <span style=\"text-decoration: underline\">as many terms as possible are understood<\/span>.<\/p>\n<p>The formulation of semantic interpretations (as explained above) remains unchanged, in that the knowledge in MAG still plays the central role in guiding how a query should be interpreted. What <em>has<\/em> changed is that when a portion of a query is thought to refer to full-text properties (i.e. title, abstract), the algorithm can now dynamically switch to a new scoring function that is more appropriate than literal unigram matching and hence less brittle as the example above shows.<\/p>\n<p>Going a bit deeper, let&#8217;s define what &#8220;as many terms as possible are understood&#8221; means. By its nature, loose semantic query interpretation will produce interpretations with the highest coverage first and fastest, and as interpretations with less coverage (i.e. terms are dropped from consideration) are generated the relevance and speed decrease. The reasons for this are technical and have to do with the search space growing exponentially as the query considered becomes less specific. So in practice &#8220;as many as possible&#8221; is better defined as &#8220;as many as possible in a fixed amount of time&#8221;.<\/p>\n<p>This means that factoring in variables such as query complexity and service load, the results generated from a fixed timeout where terms are more loosely matched (aka the result \u201ctail\u201d) could vary between sessions. However because the interpretations with highest coverage are generated first, the results they cover (aka the &#8220;head&#8221;) are very stable.<\/p>\n<p>While this change is a great remedy for queries with full-text matching intent, the loosened interpretation does also impact semantic search results as they are no longer as concise as before due to a longer result &#8220;tail&#8221; that includes full-text matches.<\/p>\n<p>As always, an example speaks a thousand words:<\/p>\n<table class=\" msr-table-default\" style=\"width: 100%;border-collapse: separate;border-spacing: inherit\" border=\"0\" cellspacing=\"inherit\" cellpadding=\"inherit\">\n<tbody>\n<tr style=\"height: 138px\">\n<td style=\"width: 50%;padding: inherit;border: 0px solid;text-align: center;height: 138px\" colspan=\"2\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-651867 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QuerySuggestion-04.png\" alt=\"Query formulation\" width=\"944\" height=\"106\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QuerySuggestion-04.png 944w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QuerySuggestion-04-300x34.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QuerySuggestion-04-768x86.png 768w\" sizes=\"auto, (max-width: 944px) 100vw, 944px\" \/><\/td>\n<\/tr>\n<tr style=\"height: 654px\">\n<td style=\"width: 50%;padding: inherit;border: 0px solid;text-align: center;height: 654px\">\n<h3><\/h3>\n<h3>BEFORE<\/h3>\n<p>Show results matching top interpretations where <span style=\"text-decoration: underline\">all<\/span> query terms are understood, ranked <span style=\"text-decoration: underline\">only<\/span> by paper salience (static rank, aka importance)<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-651801\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/AllResults-490.png\" alt=\"\" width=\"490\" height=\"166\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/AllResults-490.png 490w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/AllResults-490-300x102.png 300w\" sizes=\"auto, (max-width: 490px) 100vw, 490px\" \/><\/td>\n<td style=\"width: 50%;padding: inherit;border: 0px solid;text-align: center;height: 654px\">\n<h3><\/h3>\n<h3>AFTER<\/h3>\n<p>Show results matching top interpretations where <u>as many query terms as possible<\/u> are understood, ranked <span style=\"text-decoration: underline\">first<\/span> by number of terms matched <span style=\"text-decoration: underline\">then<\/span> by paper salience<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-651804\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/AnyResults-490.png\" alt=\"\" width=\"490\" height=\"462\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/AnyResults-490.png 490w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/AnyResults-490-300x283.png 300w\" sizes=\"auto, (max-width: 490px) 100vw, 490px\" \/><\/p>\n<p>&#8230;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-652080\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/AnyResults-tail-490.png\" alt=\"\" width=\"490\" height=\"145\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/AnyResults-tail-490.png 490w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/AnyResults-tail-490-300x89.png 300w\" sizes=\"auto, (max-width: 490px) 100vw, 490px\" \/><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Let&#8217;s take a closer look at the new &#8220;loose&#8221; semantic search algorithm, as it comes with a new user interface that illustrates how each search result is understood in the context of the user query:<\/p>\n<table class=\"aligncenter\" style=\"width: 905px;border-collapse: separate;border-spacing: inherit;border-style: solid\" border=\"1\" cellspacing=\"inherit\" cellpadding=\"inherit\">\n<tbody>\n<tr>\n<td style=\"width: 905px;padding: inherit;border: 1px solid\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-651897 alignnone aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QueryResult-01.png\" alt=\"\" width=\"894\" height=\"270\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QueryResult-01.png 894w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QueryResult-01-300x91.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QueryResult-01-768x232.png 768w\" sizes=\"auto, (max-width: 894px) 100vw, 894px\" \/><\/p>\n<blockquote>\n<p style=\"text-align: center\"><em>As mentioned earlier, results are first ranked based on the number of query terms matched. In this case the first result matched all query terms and takes the top spot <span style=\"text-decoration: underline\">even though it has a lower static rank (and citation count) than the following two results<\/span>. Another important item to call out is that when query terms are matched using synonyms, the synonymous terms are shown in parenthesis next to the canonical form, e.g. the user typed &#8220;z shen&#8221; but it was matched to &#8220;zhihong shen&#8221;.<\/em><\/p>\n<p>&nbsp;<\/p><\/blockquote>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-651900 alignnone aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QueryResult-02.png\" alt=\"\" width=\"899\" height=\"263\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QueryResult-02.png 899w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QueryResult-02-300x88.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QueryResult-02-768x225.png 768w\" sizes=\"auto, (max-width: 899px) 100vw, 899px\" \/><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-651903 alignnone aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QueryResult-03.png\" alt=\"\" width=\"898\" height=\"375\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QueryResult-03.png 898w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QueryResult-03-300x125.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QueryResult-03-768x321.png 768w\" sizes=\"auto, (max-width: 898px) 100vw, 898px\" \/><\/p>\n<blockquote>\n<p style=\"text-align: center\"><em>Here we can see the new semantic search results are based on &#8220;loose&#8221; interpretations. In both cases, the query terms &#8220;acl 2018&#8221; were not understood in the context of the result, and were shown as crossed out while the other terms maintain the same semantic understanding as the first result. Additionally, both results have a <span style=\"text-decoration: underline\">higher static rank<\/span> than the first result but are <span style=\"text-decoration: underline\">ranked lower<\/span> because they match less of the query.<\/em><\/p>\n<p>&nbsp;<\/p><\/blockquote>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone aligncenter size-full wp-image-652083\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QueryResult-04.png\" alt=\"\" width=\"896\" height=\"266\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QueryResult-04.png 896w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QueryResult-04-300x89.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QueryResult-04-768x228.png 768w\" sizes=\"auto, (max-width: 896px) 100vw, 896px\" \/><\/p>\n<blockquote>\n<p style=\"text-align: center\"><em style=\"font-family: inherit;font-size: inherit\">As we look farther into the tail of results we can see how much of the query can be dropped (in this case 4 of the 8 query terms).<\/em><\/p>\n<\/blockquote>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<h2>Matching phrases<\/h2>\n<p>Historically Microsoft Academic has support for matching queries to values in a few different ways:<\/p>\n<ul>\n<li>Matching exact values, e.g.<br \/>\n&#8220;a web scale system for scientific knowledge exploration&#8221; => &#8220;a web scale system for scientific knowledge exploration&#8221;<\/li>\n<li>Matching the beginning of values (aka prefix completions, only available as query suggestions), e.g.<br \/>\n&#8220;a web scale system for scientific&#8221; => &#8220;a web scale system for scientific <strong>knowledge exploration<\/strong>&#8220;<\/li>\n<li>Literally matching words from the value, e.g.<br \/>\n&#8220;microsoft academic overview&#8221; => &#8220;<strong>an<\/strong> overview<strong> of<\/strong> microsoft academic<strong> service mas and applications<\/strong>&#8220;<\/li>\n<\/ul>\n<p>In addition we now support a new form of partial value matching based on <span style=\"text-decoration: underline\">phrases<\/span>. This is a common feature frequently seen in keyword search, where query interpretation prefers interpretations with closer term proximity. For example, comparing results for the query &#8220;deep learning brain images&#8221; based on simple word matching and phrase matching:<\/p>\n<p>Top 5 papers using word matching, where results are based on matching words and ranking based on paper static rank:<\/p>\n<ul>\n<li>Classification of CT <span style=\"color: #ff0000\">brain images<\/span> based on <span style=\"color: #ff0000\">deep learning<\/span> networks<br \/>\n(Static rank = -18.994, Distance = 4)<\/li>\n<li>Unsupervised <span style=\"color: #ff0000\">Deep<\/span> Feature <span style=\"color: #ff0000\">Learning<\/span> for Deformable Registration of MR <span style=\"color: #ff0000\">Brain Images<br \/>\n<\/span>(Static rank = -19.036, Distance = 8)<\/li>\n<li>Application of <span style=\"color: #ff0000\">deep<\/span> transfer <span style=\"color: #ff0000\">learning<\/span> for automated <span style=\"color: #ff0000\">brain<\/span> abnormality classification using MR <span style=\"color: #ff0000\">images<br \/>\n<\/span>(Static rank = -19.305, Distance = 10)<\/li>\n<li>Age estimation from <span style=\"color: #ff0000\">brain<\/span> MRI <span style=\"color: #ff0000\">images<\/span> using <span style=\"color: #ff0000\">deep learning<br \/>\n<\/span>(Static rank = -19.727, Distance = 6)<\/li>\n<li>Exploring <span style=\"color: #ff0000\">deep<\/span> features from <span style=\"color: #ff0000\">brain<\/span> tumor magnetic resonance <span style=\"color: #ff0000\">images<\/span> via transfer <span style=\"color: #ff0000\">learning<br \/>\n<\/span>(Static rank = -20.06, Distance = 13)<\/li>\n<\/ul>\n<p>Top 5 papers using phrase matching, where results are based on first matching words and then re-ranking based on <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/academic.microsoft.com\/search?q=edit%20distance&qe=%40%40%40Composite(F.FN%3D%3D%27edit%20distance%27)&f=&orderBy=4&skip=0&take=10\">edit distance<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> between query and value (ignoring stop words):<\/p>\n<ul>\n<li><span style=\"color: #ff0000\">Deep Learning<\/span> on <span style=\"color: #ff0000\">Brain Images<\/span> in Autism: What Do Large Samples Reveal of Its Complexity?<br \/>\n(Static rank = -20.372, Distance = 0)<\/li>\n<li><span style=\"color: #ff0000\">Deep learning<\/span> of <span style=\"color: #ff0000\">brain images<\/span> and its application to multiple sclerosis<br \/>\n(Static rank = -20.534, Distance = 0)<\/li>\n<li>Classification of CT <span style=\"color: #ff0000\">brain images<\/span> based on <span style=\"color: #ff0000\">deep learning<\/span> networks<br \/>\n(Static rank = -18.994, Distance = 4)<\/li>\n<li>Unsupervised <span style=\"color: #ff0000\">Deep<\/span> Feature <span style=\"color: #ff0000\">Learning<\/span> for Deformable Registration of MR <span style=\"color: #ff0000\">Brain Images<br \/>\n<\/span>(Static rank = -19.036, Distance = 8)<\/li>\n<li>A <span style=\"color: #ff0000\">deep learning<\/span>-based segmentation method for <span style=\"color: #ff0000\">brain<\/span> tumor in MR <span style=\"color: #ff0000\">images<br \/>\n<\/span>(Static rank = -20.171, Distance = 6)<\/li>\n<\/ul>\n<p>This new ability to re-rank based on query-value edit distance also allows us to support quoted phrases in queries:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-654192\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QuotesResults01.png\" alt=\"\" width=\"1464\" height=\"721\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QuotesResults01.png 1464w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QuotesResults01-300x148.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QuotesResults01-1024x504.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/QuotesResults01-768x378.png 768w\" sizes=\"auto, (max-width: 1464px) 100vw, 1464px\" \/><\/p>\n<p>The rules for quoted values are:<\/p>\n<ul>\n<li>A quoted value can only be matched to a single field, i.e. title, author name, journal name, etc.:<br \/>\nWorks: &#8220;deep learning&#8221; (matches field of study)<br \/>\nWorks: &#8220;microsoft research&#8221; (matches affiliation)<br \/>\nDoesn&#8217;t work: &#8220;deep learning microsoft research&#8221;<\/li>\n<li>For attributes that support partial matching (title, abstract), all quoted words must have a term-based <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/academic.microsoft.com\/search?q=edit%20distance&qe=%40%40%40Composite(F.FN%3D%3D%27edit%20distance%27)&f=&orderBy=4&skip=0&take=10\">edit distance<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> of zero, ignoring <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/academic.microsoft.com\/search?q=stop%20words&qe=%40%40%40Composite(F.FN%3D%3D%27stop%20words%27)&f=&orderBy=4&skip=0&take=10\">stop words<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>:<br \/>\nWorks: &#8220;deep learning brain images&#8221;<br \/>\nDoesn&#8217;t work: &#8220;brain deep images learning&#8221;<\/li>\n<li>Queries can contain multiple quoted values, each being evaluated using the rules defined above:<br \/>\nWorks: &#8220;deep learning&#8221; &#8220;microsoft research&#8221;<\/li>\n<li>A quoted value is treated as a single query term and can be dropped accordingly based on the new search algorithm:<br \/>\nDoesn&#8217;t work: &#8220;deep learning at microsoft research rocks!&#8221;<br \/>\nWorks: deep learning &#8220;at microsoft research rocks!&#8221;<\/li>\n<li>All terms in a quoted value are <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/docs.microsoft.com\/en-us\/academic-services\/knowledge-exploration-service\/concepts-queries#normalization\">normalized in exactly the same fashion<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> as non-quoted terms<\/li>\n<\/ul>\n<h2>Support for searching paper abstract<\/h2>\n<p>We have finally added support for a long requested feature: searching paper abstracts! This is an important addition that significantly expands the reach of our partial-term matching for papers.<\/p>\n<p>Abstracts are treated like all other semantic values, meaning they can be matched implicitly or explicitly using the &#8220;abstract:&#8221; scope, e.g.:<\/p>\n<ul>\n<li>title: &#8220;microsoft academic&#8221; abstract: &#8220;heterogeneous entity graph&#8221;<\/li>\n<li>&#8220;microsoft academic&#8221; &#8220;heterogeneous entity graph&#8221;<\/li>\n<\/ul>\n<h2>Scoped queries<\/h2>\n<p>Microsoft Academic has always supported query &#8220;hints&#8221; that require subsequent terms to match a specific attribute, i.e. the classic &#8220;papers about <field of study>&#8221;, but with our most recent release we now also support colon delimited scopes.<\/p>\n<p>The rules for scopes are simple: the query term immediately after the scope must be matched with that scopes attribute type. A query &#8220;term&#8221; is defined as a single word or a quoted phrase. For example, if you wanted to match papers with &#8220;heterogeneous&#8221;, &#8220;entity&#8221; and &#8220;graph&#8221; in their abstracts but didn&#8217;t care about them being part of a sequence you would issue the query &#8220;abstract: heterogeneous abstract: entity abstract: graph&#8221;.<\/p>\n<p>Supported scopes and their corresponding triggers:<\/p>\n<table style=\"border-style: none;border-collapse: separate\" border=\"inherit\" cellspacing=\"5px\">\n<tbody>\n<tr style=\"color: #fff;background-color: #5d6680\">\n<td style=\"text-align: center;padding: inherit;width: 131px\"><strong>Scope<\/strong><\/td>\n<td style=\"padding: inherit;width: 363px\"><strong>Description<\/strong><\/td>\n<td style=\"padding: inherit;width: 509px\"><strong>Example<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;padding: inherit;width: 131px\">abstract:<\/td>\n<td style=\"padding: inherit;width: 363px\">Match term or quoted value from the paper abstract<\/td>\n<td style=\"padding: inherit;width: 509px\">abstract: &#8220;heterogeneous entity graph comprised of six types of entities&#8221;<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;padding: inherit;width: 131px\">affiliation:<\/td>\n<td style=\"padding: inherit;width: 363px\">Match affiliation (institution) name<\/td>\n<td style=\"padding: inherit;width: 509px\">affiliation: &#8220;microsoft research&#8221;<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;padding: inherit;width: 131px\">author:<\/td>\n<td style=\"padding: inherit;width: 363px\">Match author name<\/td>\n<td style=\"padding: inherit;width: 509px\">author: &#8220;darrin eide&#8221;<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;padding: inherit;width: 131px\">conference:<\/td>\n<td style=\"padding: inherit;width: 363px\">Match conference series name<\/td>\n<td style=\"padding: inherit;width: 509px\">conference: www<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;padding: inherit;width: 131px\">doi:<\/td>\n<td style=\"padding: inherit;width: 363px\">Match paper Document Object Identifier (DOI)<\/td>\n<td style=\"padding: inherit;width: 509px\">doi: 10.1037\/0033-2909.105.1.156<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;padding: inherit;width: 131px\">journal:<\/td>\n<td style=\"padding: inherit;width: 363px\">Match journal name<\/td>\n<td style=\"padding: inherit;width: 509px\">journal: nature<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;padding: inherit;width: 131px\">title:<\/td>\n<td style=\"padding: inherit;width: 363px\">Match term or quoted value from the paper title<\/td>\n<td style=\"padding: inherit;width: 509px\">title: &#8220;an overview of microsoft academic service mas and applications&#8221;<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;padding: inherit;width: 131px\">topic:<\/td>\n<td style=\"padding: inherit;width: 363px\">Match paper topic (field of study)<\/td>\n<td style=\"padding: inherit;width: 509px\">topic: &#8220;knowledge base&#8221;<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center;padding: inherit;width: 131px\">year:<\/td>\n<td style=\"padding: inherit;width: 363px\">Match paper publication year<\/td>\n<td style=\"padding: inherit;width: 509px\">year: 2015<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<h2>Feedback welcome<\/h2>\n<p>These changes have been in the works for over 6 months, and as always we&#8217;d love to hear your feedback, be it suggestions, critiques, bug reports or kudos. To provide feedback, navigate to <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/academic.microsoft.com\/\">Microsoft Academic<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and click the &#8220;feedback&#8221; icon in the lower right-hand corner.<\/p>\n<p>Stay tuned in the coming weeks for another search-oriented post about how you can accomplish <span style=\"text-decoration: underline\">reference string parsing<\/span> using Microsoft Academic Services!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Discussion of new changes to Microsoft Academic search, including expanded keyword search, phrase support, abstract search and more<\/p>\n","protected":false},"author":36554,"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-content-parent":170262,"msr_hide_image_in_river":0,"footnotes":""},"research-area":[],"msr-locale":[268875],"msr-post-option":[],"class_list":["post-661779","msr-blog-post","type-msr-blog-post","status-publish","hentry","msr-locale-en_us"],"msr_assoc_parent":{"id":170262,"type":"project"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/661779","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-blog-post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/36554"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/661779\/revisions"}],"predecessor-version":[{"id":661800,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/661779\/revisions\/661800"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=661779"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=661779"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=661779"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=661779"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}