Speaker-Adaptive Lipreading via Spatio-Temporal Information Learning

何怡, 杨磊, 王晗亦, 王士林

三月 2024

摘要

Lipreading has been rapidly developed recently with the help of large-scale datasets and big models. Despite the significant progress made, the performance of lipreading models still falls short when dealing with unseen speakers. Therefore by analyzing the characteristics of speakers when uttering, we propose a novel parameter-efficient fine-tuning method based on spatio-temporal information learning. In our approach, a low-rank adaptation module that can influence global spatial features and a plug-and-play temporal adaptive weight learning module are designed in the front-end and back-end of the lipreading model, which can adapt to the speaker’s unique features such as the shape of the lips and the style of speech, respectively. An Adapter module is added between them to further enhance the spatio-temporal learning. The final experiments on the LRW-ID and GRID datasets demonstrate that our method achieves state-of-the-art performance even with fewer parameters.

类型

Conference

出版物

In IEEE International Conference on Acoustics, Speech and Signal Processing 2024

Speaker-Adaptive Lipreading via Spatio-Temporal Information Learning

摘要

何怡

博士研究生

杨磊

博士研究生

王晗亦

博士研究生

王士林

教授