Abstract: Query-by-Example Spoken Term Detection (QbE-STD) retrieves relevant audio files corresponding to a spoken query, without relying on explicit word-level textual transcriptions. In ...
Abstract: As one of the most representative applications built on deep learning, audio systems, including keyword spotting, automatic speech recognition, and speaker identification, have recently been ...
We introduce MMAR, a new benchmark designed to evaluate the deep reasoning capabilities of Audio-Language Models (ALMs) across massive multi-disciplinary tasks. MMAR comprises 1,000 meticulously ...