Abstract

Among many existing time difference of arrival (TDOA) based sound source localization (SSL) algorithms, the Phase Transform (PHAT) is extremely popular for its excellent performance in low noise environments, even under relatively heavy reverberation. However, PHAT was developed as a heuristic approach and its working principle has not been completely understood. In this paper, we present the relationship between PHAT and a maximum likelihood (ML) framework for multi-microphone sound source localization. We show that when the environment noise approaches zero, PHAT is indeed a special case of the ML algorithm, which explains its good performance under low noise environments. In addition, we show that as long as the noise stays low, PHAT remains optimal in ML sense even when the room reverberation is heavy, which explains its robustness over reverberation.