Membership Encoding for Black-Box Neural Network Watermarking

摘要

Deep neural network watermarking is an emerging technique for protecting the copyright of models. Most existing black-box watermarking methods leverage the backdoor, making them inherently vulnerable to backdoor removal attacks. In this paper, we propose a novel watermark removal attack, Misleading Fine-tuning, which effectively eliminates backdoor-based watermarks with limited data. To counter this threat, we present a novel black-box watermarking method based on membership encoding. This method overfits the protected model on a subset of training data that serve as triggers, thereby making it resistant to backdoor removal attacks. Extensive experiments demonstrate its fidelity and robustness against adversarial modifications, whether applied to the model or the inputs.

出版物
In IEEE International Conference on Acoustics, Speech and Signal Processing 2025
章杭炜
章杭炜
硕士研究生
李方圻
李方圻
博士研究生
王士林
王士林
教授