{"id":487019,"date":"2018-05-20T01:27:59","date_gmt":"2018-05-20T08:27:59","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=487019"},"modified":"2018-10-16T22:20:33","modified_gmt":"2018-10-17T05:20:33","slug":"multi-modality-multi-task-recurrent-neural-network-online-action-detection","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/multi-modality-multi-task-recurrent-neural-network-online-action-detection\/","title":{"rendered":"Multi-Modality Multi-Task Recurrent Neural Network for Online Action Detection"},"content":{"rendered":"<p>Online action detection is a brand new challenge and plays a critical role in the visual surveillance analytics. It goes one step further over conventional action recognition task, which recognizes human actions from well-segmented clips. Online action detection is desired to identify the action type and localize action positions on the fly from the untrimmed stream data. In this paper, we propose a Multi-Modality Multi-Task Recurrent Neural Network (MM-MT RNN), which incorporates both RGB and Skeleton networks. We design different temporal modeling networks to capture specific characteristics from various modalities. Then, a deep Long Short-Term Memory (LSTM) subnetwork is utilized effectively to capture the complex long range temporal dynamics, naturally avoiding the conventional sliding window design and thus ensuring high computational efficiency. Constrained by a multi-task objective function in the training phase, this network achieves superior detection performance and is capable of automatically localizing the start and end points of actions more accurately. Furthermore, embedding subtask of regression provides the ability to forecast the action prior to its occurrence. We evaluate the proposed method and several other methods in action detection and forecasting on the Online Action Detection Dataset (OAD) and Gaming Action Dataset (G3D) datasets. Experimental results demonstrate that our model achieves the state-of-the-art performance on both two tasks.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Online action detection is a brand new challenge and plays a critical role in the visual surveillance analytics. It goes one step further over conventional action recognition task, which recognizes human actions from well-segmented clips. Online action detection is desired to identify the action type and localize action positions on the fly from the untrimmed [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"IEEE \u2013 Institute of Electrical and Electronics Engineers","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"IEEE Trans. on Cir. and Sys. for Video Technology (online version)","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"IEEE Transactions on Circuits and Systems for Video Technology","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"\u00a9 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting\/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.","msr_conference_name":"","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2018-01-30","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"https:\/\/ieeexplore.ieee.org\/document\/8274921\/","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13562],"msr-publication-type":[193715],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-487019","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-computer-vision","msr-locale-en_us"],"msr_publishername":"IEEE \u2013 Institute of Electrical and Electronics Engineers","msr_edition":"IEEE Trans. on Cir. and Sys. for Video Technology (online version)","msr_affiliation":"","msr_published_date":"2018-01-30","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"IEEE Transactions on Circuits and Systems for Video Technology","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"https:\/\/ieeexplore.ieee.org\/document\/8274921\/","msr_doi":"","msr_publication_uploader":[{"type":"url","title":"https:\/\/ieeexplore.ieee.org\/document\/8274921\/","viewUrl":false,"id":false,"label_id":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":0,"url":"https:\/\/ieeexplore.ieee.org\/document\/8274921\/"}],"msr-author-ordering":[{"type":"text","value":"Jiaying Liu","user_id":0,"rest_url":false},{"type":"text","value":"Yanghao Li","user_id":0,"rest_url":false},{"type":"text","value":"Sijie Song","user_id":0,"rest_url":false},{"type":"text","value":"Junliang Xing","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Cuiling Lan","user_id":31487,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Cuiling Lan"},{"type":"user_nicename","value":"Wenjun Zeng","user_id":34830,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Wenjun Zeng"}],"msr_impact_theme":[],"msr_research_lab":[199560],"msr_event":[],"msr_group":[144711],"msr_project":[],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"article","related_content":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/487019","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/487019\/revisions"}],"predecessor-version":[{"id":487022,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/487019\/revisions\/487022"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=487019"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=487019"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=487019"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=487019"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=487019"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=487019"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=487019"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=487019"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=487019"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=487019"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=487019"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=487019"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=487019"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}