{"id":940548,"date":"2023-05-11T17:24:36","date_gmt":"2023-05-12T00:24:36","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/"},"modified":"2025-07-10T10:42:38","modified_gmt":"2025-07-10T17:42:38","slug":"logical-transformers-infusing-logical-structures-into-pre-trained-language-models","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/logical-transformers-infusing-logical-structures-into-pre-trained-language-models\/","title":{"rendered":"Logical Transformers: Infusing Logical Structures into Pre-Trained Language Models"},"content":{"rendered":"<p><span dir=\"ltr\" role=\"presentation\">Natural language has a logical structure and in<\/span><span dir=\"ltr\" role=\"presentation\">formation, the understanding of which is essen<\/span><span dir=\"ltr\" role=\"presentation\">tial for many language-based tasks. Existing <\/span><span dir=\"ltr\" role=\"presentation\">pre-trained language models based on trans<\/span><span dir=\"ltr\" role=\"presentation\">former architectures mostly adopt a classical <\/span><span dir=\"ltr\" role=\"presentation\">design for constructing their input embeddings <\/span><span dir=\"ltr\" role=\"presentation\">that ignores the logical structures underlying <\/span><span dir=\"ltr\" role=\"presentation\">natural language texts, thus limiting their abil<\/span><span dir=\"ltr\" role=\"presentation\">ity to better capture and encode main logical <\/span><span dir=\"ltr\" role=\"presentation\">information in the input sequences. To\u00a0<\/span><span dir=\"ltr\" role=\"presentation\">overcome\u00a0such limitations, we first propose a novel <\/span><span dir=\"ltr\" role=\"presentation\">approach to construct<\/span> <span dir=\"ltr\" role=\"presentation\">logic-aware input embed<\/span><span dir=\"ltr\" role=\"presentation\">dings<\/span> <span dir=\"ltr\" role=\"presentation\">for transformer language models through <\/span><span dir=\"ltr\" role=\"presentation\">a combination of logic detection, logic map<\/span><span dir=\"ltr\" role=\"presentation\">ping and hierarchical logical projections, and <\/span><span dir=\"ltr\" role=\"presentation\">then develop a corresponding new modeling <\/span><span dir=\"ltr\" role=\"presentation\">paradigm that can upgrade all existing trans<\/span><span dir=\"ltr\" role=\"presentation\">former language models into<\/span> <span dir=\"ltr\" role=\"presentation\">logical transform<\/span><span dir=\"ltr\" role=\"presentation\">ers<\/span> <span dir=\"ltr\" role=\"presentation\">to consistently boost their performance on <\/span><span dir=\"ltr\" role=\"presentation\">different NLU and NLG tasks.<\/span> <span dir=\"ltr\" role=\"presentation\">Our empiri<\/span><span dir=\"ltr\" role=\"presentation\">cal experiments on three challenging abstrac<\/span><span dir=\"ltr\" role=\"presentation\">tive text summarization tasks demonstrate that <\/span><span dir=\"ltr\" role=\"presentation\">our proposed logical transformer language ap<\/span><span dir=\"ltr\" role=\"presentation\">proach achieves consistently superior perfor<\/span><span dir=\"ltr\" role=\"presentation\">mance over their baseline transformer models <\/span><span dir=\"ltr\" role=\"presentation\">through a deeper understanding of the logical <\/span><span dir=\"ltr\" role=\"presentation\">structures of the source texts.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Natural language has a logical structure and information, the understanding of which is essential for many language-based tasks. Existing pre-trained language models based on transformer architectures mostly adopt a classical design for constructing their input embeddings that ignores the logical structures underlying natural language texts, thus limiting their ability to better capture and encode main [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"Proceedings of ACL 2023","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2023-5-2","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"https:\/\/2023.aclweb.org\/","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":null,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13551,13545,13554],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-940548","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-graphics-and-multimedia","msr-research-area-human-language-technologies","msr-research-area-human-computer-interaction","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2023-5-2","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/ACL-Paper.pdf","id":"967020","title":"acl-paper","label_id":"243109","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/aclanthology.org\/2023.findings-acl.111.pdf","label_id":"243109","label":0}],"msr_related_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/ACL-Paper.pdf","id":"967020","title":"acl-paper","label_id":"243118","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/aclanthology.org\/2023.findings-acl.111.pdf","label_id":"243118","label":0}],"msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":967020,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/09\/ACL-Paper.pdf"},{"id":940551,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/05\/ACL.pdf"}],"msr-author-ordering":[{"type":"text","value":"Borui Wang","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Qiuyuan Huang","user_id":36356,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Qiuyuan Huang"},{"type":"user_nicename","value":"Budhaditya Deb","user_id":36578,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Budhaditya Deb"},{"type":"user_nicename","value":"Aaron Halfaker","user_id":39733,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Aaron Halfaker"},{"type":"text","value":"Liqun Shao","user_id":0,"rest_url":false},{"type":"guest","value":"daniel-mcduff-4","user_id":860436,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=daniel-mcduff-4"},{"type":"user_nicename","value":"Ahmed Awadallah","user_id":31979,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ahmed Awadallah"},{"type":"text","value":"Dragomir Radev","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Jianfeng Gao","user_id":32246,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Jianfeng Gao"}],"msr_impact_theme":[],"msr_research_lab":[199565,992148],"msr_event":[945648],"msr_group":[144931],"msr_project":[788159,931254],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":788159,"post_title":"Agent AI","post_name":"agent-ai","post_type":"msr-project","post_date":"2023-09-25 21:53:00","post_modified":"2024-02-28 07:03:22","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/agent-ai\/","post_excerpt":"Agent-based multimodal AI systems are becoming a ubiquitous presence in our everyday lives. A promising direction for making these systems more interactive is to embody them as agents within specific environments. The grounding of large foundation models to act as agents within specific environments can provide a way of incorporating visual and contextual information into an embodied system. For example, a system that can perceive user actions, human behavior, environment objects, audio expressions, and the&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/788159"}]}},{"ID":931254,"post_title":"Infinite Mixed Reality with Emergent Abilities","post_name":"mixed-reality","post_type":"msr-project","post_date":"2023-04-26 11:06:46","post_modified":"2024-01-24 23:36:29","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/mixed-reality\/","post_excerpt":"----Gaming\/Mix-Reality\/Robots Knowledge-memory augmented interaction for cross-modality and reality-agnostic integration with Emergence Mechanism. Selected as the project in&nbsp;HackBox 2023 (opens in new tab) Shipped to Office Teams","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/931254"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/940548","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/940548\/revisions"}],"predecessor-version":[{"id":1144493,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/940548\/revisions\/1144493"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=940548"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=940548"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=940548"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=940548"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=940548"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=940548"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=940548"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=940548"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=940548"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=940548"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=940548"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=940548"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=940548"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}