{"id":155938,"date":"1992-01-01T00:00:00","date_gmt":"1992-01-01T00:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/msr-research-item\/structural-patterns-vs-string-patterns-for-extracting-semantic-information-from-dictionaries\/"},"modified":"2018-10-16T20:10:54","modified_gmt":"2018-10-17T03:10:54","slug":"structural-patterns-vs-string-patterns-for-extracting-semantic-information-from-dictionaries","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/structural-patterns-vs-string-patterns-for-extracting-semantic-information-from-dictionaries\/","title":{"rendered":"Structural Patterns vs. String Patterns for Extracting Semantic Information from Dictionaries"},"content":{"rendered":"<div class=\"asset-content\">\n<p>As the research on extracting semantic information from on-line dictionaries proceeds, most progress has been made in the area of extracting the <em>genus<\/em> terms. Two methods are being used &#8212; patterns matching at the string level and at the structural analysis level &#8212; both of which seem to yield equally promising results. Little theoretical work, however, is being done to determine the set of possible <em>differentiae<\/em> to be identified, and therefore also the set of possible semantic relations that can be extracted from them. In fact, Wilks remarks that as far as identifying the differentiae and organizing that information into a list of properties is concerns, &#8220;such demands are beyond the abilities of the best current extraction techniques&#8221; (Wilks et al., 1989, p. 227). However, the current state of the art in computational linguistics demands that semantic information beyond genus terms be available now, on a large scale, to push forward the current theories, whether that is knowledge-based parsing or parsing first with a syntactic component, followed by a semantic component.<\/p>\n<p>In this paper, we will focus on analyzing the definitions not for the genus terms, but for the semantic relations that can be extracted from the differentiae (Calzolari 1984). Although many have accepted the use of syntactic analyses for this purpose for some time now (for example Jensen and Binot 1987, Klavans 1990, Ravin 1990 and Vanderwende 1990, all of which use the PLNLP English Parser to provide the structural information), many others still do not. We will demonstrate with examples why <em>only<\/em> patterns based on syntactic information (henceforth, structural patterns) provide reliable semantic relations for the differentiae. Patterns that match definition text at the string level (henceforth, string patterns) are conceivable, but cannot capture the variations in the differentiae as easily as structural patterns. In addition, although it is possible to parse the definition texts using a grammar designed for one dictionary (e.g. a grammar of &#8220;Longmanese&#8221;, see Alshawi 1989), we have found that a general, broad-coverage grammar of English or of Italian provides a level of analysis that is as good as, and possibly superior to, a dictionary-specific grammar.<\/p>\n<\/div>\n<p><!-- .asset-content --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>As the research on extracting semantic information from on-line dictionaries proceeds, most progress has been made in the area of extracting the genus terms. Two methods are being used &#8212; patterns matching at the string level and at the structural analysis level &#8212; both of which seem to yield equally promising results. Little theoretical work, [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"Association for Computational Linguistics","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"Proceedings of the Fourteenth International Conference on Computational Linguistics","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"Proceedings of the Fourteenth International Conference on Computational Linguistics","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"Simonetta Montemagni","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2007-04-06","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"http:\/\/aclweb.org\/anthology\/C\/C92\/C92-2083.pdf","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":1992,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13545],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-155938","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_publishername":"Association for Computational Linguistics","msr_edition":"Proceedings of the Fourteenth International Conference on Computational Linguistics","msr_affiliation":"","msr_published_date":"2007-04-06","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"299348","msr_publicationurl":"http:\/\/aclweb.org\/anthology\/C\/C92\/C92-2083.pdf","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"montemagni-vanderwende-1992","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/1992\/01\/montemagni-vanderwende-1992.pdf","id":299348,"label_id":0},{"type":"url","title":"http:\/\/aclweb.org\/anthology\/C\/C92\/C92-2083.pdf","viewUrl":false,"id":false,"label_id":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":0,"url":"http:\/\/aclweb.org\/anthology\/C\/C92\/C92-2083.pdf"},{"id":299348,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/1992\/01\/montemagni-vanderwende-1992.pdf"}],"msr-author-ordering":[{"type":"text","value":"Simonetta Montemagni","user_id":0,"rest_url":false},{"type":"user_nicename","value":"lucyv","user_id":32746,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=lucyv"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[169675],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":169675,"post_title":"MindNet","post_name":"mindnet","post_type":"msr-project","post_date":"2001-12-19 17:44:32","post_modified":"2019-08-14 14:34:33","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/mindnet\/","post_excerpt":"Overview MindNet is a knowledge representation project that uses our broad-coverage parser to build semantic networks from dictionaries, encyclopedias, and free text. MindNets are produced by a fully automatic process that takes the input text, sentence-breaks it, parses each sentence to build a semantic dependency graph (Logical Form), aggregates these individual graphs into a single large graph, and then assigns probabilistic weights to subgraphs based on their frequency in the corpus as a whole. The&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/169675"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/155938","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/155938\/revisions"}],"predecessor-version":[{"id":523912,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/155938\/revisions\/523912"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=155938"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=155938"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=155938"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=155938"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=155938"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=155938"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=155938"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=155938"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=155938"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=155938"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=155938"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=155938"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=155938"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}