{"id":654684,"date":"2020-04-29T03:36:36","date_gmt":"2020-04-29T10:36:36","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=654684"},"modified":"2021-10-13T21:24:10","modified_gmt":"2021-10-14T04:24:10","slug":"unitrans-unifying-model-transfer-and-data-transfer-for-cross-lingual-named-entity-recognition-with-unlabeled-data","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/unitrans-unifying-model-transfer-and-data-transfer-for-cross-lingual-named-entity-recognition-with-unlabeled-data\/","title":{"rendered":"UniTrans: Unifying Model Transfer and Data Transfer for Cross-Lingual Named Entity Recognition with Unlabeled Data"},"content":{"rendered":"<p>Prior works in cross-lingual named entity recognition (NER) with no\/little labeled data fall into two primary categories: model transfer based and data transfer based methods. In this paper we<br \/>\nfind that both method types can complement each other, in the sense that, the former can exploit context information via language-independent features but sees no task-specific information in the target language; while the latter generally generates pseudo target-language training data via translation but its exploitation of context information is weakened by inaccurate translations. Moreover, prior works rarely leverage unlabeled data in the target language, which can be effortlessly collected and potentially contains valuable information for improved results. To handle both problems, we propose a novel approach termed UniTrans to Unify both model and data Transfer for cross-lingual NER, and furthermore, to leverage the available information from unlabeled target-language data via enhanced knowledge distillation. We evaluate our proposed UniTrans over 4 target languages on benchmark datasets. Our experimental results show that it substantially outperforms the existing state-of-the-art methods.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Prior works in cross-lingual named entity recognition (NER) with no\/little labeled data fall into two primary categories: model transfer based and data transfer based methods. In this paper we find that both method types can complement each other, in the sense that, the former can exploit context information via language-independent features but sees no task-specific [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"29th International Joint Conference on Artificial Intelligence (IJCAI 2020)","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2020-7-1","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"https:\/\/www.ijcai.org\/Proceedings\/2020","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13545],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-654684","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2020-7-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/www.ijcai.org\/Proceedings\/2020\/0543.pdf","label_id":"243109","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"user_nicename","value":"Qianhui Wu","user_id":40741,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Qianhui Wu"},{"type":"text","value":"Zijia Lin","user_id":0,"rest_url":false},{"type":"user_nicename","value":"B\u00f6rje F. Karlsson","user_id":31280,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=B\u00f6rje F. Karlsson"},{"type":"text","value":"Biqing Huang","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Jian-Guang Lou","user_id":32337,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Jian-Guang Lou"}],"msr_impact_theme":[],"msr_research_lab":[199560],"msr_event":[],"msr_group":[144919,714577],"msr_project":[792599,717721,714646],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":792599,"post_title":"Table Interpretation","post_name":"table-interpretation","post_type":"msr-project","post_date":"2021-11-05 02:02:36","post_modified":"2024-09-25 11:42:48","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/table-interpretation\/","post_excerpt":"Bringing out the power of semantics in tabular data Tables are commonly used to organize information, playing a key role in data analytics, scientific research, and business communication. The ability to automatically extract semantics in tables can empower many downstream applications such as data analytics, robotic process automation (RPA), knowledge base population, etc. In this project, we explore multiple aspects of semantic table understanding and real-world applications of such technologies. One of the outcomes of&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/792599"}]}},{"ID":717721,"post_title":"ezPitch: Connecting Salespersons and Customers through Relevant News","post_name":"ezpitch-connecting-salespersons-and-customers-through-relevant-news","post_type":"msr-project","post_date":"2021-01-19 08:25:06","post_modified":"2021-01-19 08:28:44","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/ezpitch-connecting-salespersons-and-customers-through-relevant-news\/","post_excerpt":"The goal of ezPitch is connecting salespersons and customers through relevant news. Why is this important? In the daily work, the sales persons need to search, track and explore the related news about customers before talking to them. For example, if there is management change in the customer\u2019s company. The sales person may need to find a way to re-build the relationship with the new leadership. If the customer\u2019s company announces an earnings report which&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/717721"}]}},{"ID":714646,"post_title":"VERT: Versatile Entity Recognition &amp; Disambiguation Toolkit","post_name":"vert-versatile-entity-recognition-disambiguation-toolkit","post_type":"msr-project","post_date":"2020-12-30 02:54:35","post_modified":"2021-10-13 21:15:01","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/vert-versatile-entity-recognition-disambiguation-toolkit\/","post_excerpt":"While knowledge about entities is a key building block in the mentioned systems, creating effective\/efficient models for real-world scenarios remains a challenge (tech\/data\/real workloads). Based on such needs, we've created VERT - a Versatile Entity Recognition &amp; Disambiguation Toolkit. VERT is a pragmatic toolkit that combines rules and ML, offering both powerful pretrained models for core entity types (recognition and linking) and the easy creation of custom models. Custom models use our deep learning-based NER\/EL&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/714646"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/654684","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/654684\/revisions"}],"predecessor-version":[{"id":784873,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/654684\/revisions\/784873"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=654684"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=654684"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=654684"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=654684"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=654684"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=654684"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=654684"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=654684"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=654684"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=654684"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=654684"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=654684"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=654684"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}