{"id":759025,"date":"2021-07-07T15:56:43","date_gmt":"2021-07-07T22:56:43","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=759025"},"modified":"2021-07-07T16:34:12","modified_gmt":"2021-07-07T23:34:12","slug":"audio-based-toxic-language-classification-using-self-attentive-convolutional-neural-network","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/audio-based-toxic-language-classification-using-self-attentive-convolutional-neural-network\/","title":{"rendered":"Audio-based Toxic Language Classification using Self-attentive Convolutional Neural Network"},"content":{"rendered":"<p>The monumental increase in online social interaction activities such as social networking or online gaming is often\u00a0 riddled by hostile or aggressive behavior that can lead to unsolicited manifestations of cyberbullying or harassment. In this work, we develop an audio-based toxic\u00a0 language classifier using self-attentive Convolutional Neural Networks (CNNs). As definitions of hostility or toxicity can vary depending on the platform or application, in this work we take a more general approach for identifying toxic utterances, one that does not depend on individual\u00a0 lexicon terms, but rather considers the entire acoustical context of the short verse or utterance. In the proposed architecture, the self-attention mechanism captures the temporal dependency of the verbal content by summarizing all the relevant information from different regions of the utterance. The proposed audio-based self-attentive CNN model is evaluated on a public and an internal dataset and achieves 75% accuracy, 79% precision, and 80% recall in identifying toxic speech recordings.<\/p>\n<p>&nbsp;<\/p>\n<div id=\"attachment_759040\" style=\"width: 586px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-759040\" class=\"wp-image-759040 \" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/07\/roc.png\" alt=\"roc curve toxicity\" width=\"576\" height=\"431\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/07\/roc.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/07\/roc-300x225.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/07\/roc-16x12.png 16w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/07\/roc-80x60.png 80w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/07\/roc-240x180.png 240w\" sizes=\"auto, (max-width: 576px) 100vw, 576px\" \/><p id=\"caption-attachment-759040\" class=\"wp-caption-text\">ROC curve for internal toxicity dataset, corpus A.<\/p><\/div>\n<p>&nbsp;<\/p>\n<div id=\"attachment_759043\" style=\"width: 752px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-759043\" class=\"wp-image-759043 \" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/07\/spec.png\" alt=\"accuracy iemocap\" width=\"742\" height=\"446\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/07\/spec.png 712w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/07\/spec-300x180.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/07\/spec-16x10.png 16w\" sizes=\"auto, (max-width: 742px) 100vw, 742px\" \/><p id=\"caption-attachment-759043\" class=\"wp-caption-text\">Accuracy plots for corpus IEMOCAP showing benefits of data augmentation.<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>The monumental increase in online social interaction activities such as social networking or online gaming is often\u00a0 riddled by hostile or aggressive behavior that can lead to unsolicited manifestations of cyberbullying or harassment. In this work, we develop an audio-based toxic\u00a0 language classifier using self-attentive Convolutional Neural Networks (CNNs). As definitions of hostility or toxicity [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"IEEE","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"European Association for Signal Processing (EURASIP)","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"2021 29th European Signal Processing Conference (EUSIPCO)","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2021-8-1","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"https:\/\/eusipco2021.org\/","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[243062],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[247678],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-759025","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-audio-acoustics","msr-locale-en_us","msr-field-of-study-signal-processing"],"msr_publishername":"IEEE","msr_edition":"","msr_affiliation":"","msr_published_date":"2021-8-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"European Association for Signal Processing (EURASIP)","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/07\/IEEE_Conference_Template_Toxcicity_detection_Eusipco-1.pdf","id":"759028","title":"ieee_conference_template_toxcicity_detection_eusipco-1","label_id":"243109","label":0}],"msr_related_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/www.microsoft.com\/en-us\/research\/video\/audio-based-toxic-language-detection\/","label_id":"243118","label":0}],"msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":759028,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/07\/IEEE_Conference_Template_Toxcicity_detection_Eusipco-1.pdf"}],"msr-author-ordering":[{"type":"text","value":"Midia Yousefi","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Dimitra Emmanouilidou","user_id":37461,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Dimitra Emmanouilidou"}],"msr_impact_theme":[],"msr_research_lab":[199565],"msr_event":[],"msr_group":[144923],"msr_project":[559086],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":559086,"post_title":"Audio Analytics","post_name":"audio-analytics","post_type":"msr-project","post_date":"2019-02-08 15:57:54","post_modified":"2023-01-13 13:28:08","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/audio-analytics\/","post_excerpt":"Audio analytics is about analyzing and understanding audio signals captured by digital devices, with numerous applications in enterprise, healthcare, productivity, and smart cities.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/559086"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/759025","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/759025\/revisions"}],"predecessor-version":[{"id":759058,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/759025\/revisions\/759058"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=759025"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=759025"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=759025"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=759025"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=759025"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=759025"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=759025"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=759025"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=759025"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=759025"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=759025"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=759025"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=759025"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}