Synthesis of Device-Independent Noise Corpora for Speech Quality Assessment

Proc. International Workshop on Acoustic Signal Enhancement (IWAENC) |

View Publication | View Publication

The perceived quality of speech captured in the presence of background noise is an important performance metric for communication devices, including portable computers and mobile phones. For a realistic evaluation of speech quality, a device under test (DUT) needs to be exposed to a variety of noise conditions either in real noise environments or via noise recordings, typically delivered over a loudspeaker system. However, the test data obtained this way is specific to the DUT and needs to be re-recorded every time the DUT hardware changes. Here we propose an approach that uses device-independent spatial noise recordings to generate device-specific synthetic test data that simulate in-situ recordings. Noise captured using a spherical microphone array is combined with the directivity patterns of the DUT, referred to here as device-related transfer functions (DRTFs), in the spherical harmonics domain. The performance of the proposed method is evaluated in terms of the predicted signal-to-noise ratio (SNR) and the predicted mean opinion score (PMOS) of the DUT under various noise conditions. The root-mean-squared errors (RMSEs) of the predicted SNR and PMOS are on average below 4~dB and 0.28, respectively, across the range of tested SNRs, target source directions, noise types, and spherical harmonics decomposition methods. These experimental results indicate that the proposed method may be suitable for generating device-specific synthetic corpora from device-independent in-situ recordings.