人工智能安全实验室·上海交通大学
人工智能安全实验室·上海交通大学
在读研究生
近期事件
科研成果
联系我们
浅色
深色
自动
Conference
Bridging Crypto with ML-based Solvers: the SAT Formulation and Benchmarks
The Boolean Satisfiability Problem (SAT) plays a crucial role in cryptanalysis, enabling tasks like key recovery and distinguisher …
宋鑫浩
PDF
Cite
GEM: Gaussian Embedding Modeling for Out-of-Distribution Detection in GUI Agents
Graphical user interface (GUI) agents have recently emerged as an intriguing paradigm for human-computer interaction, capable of …
吴铮
,
程彭洲
,
吴宗儒
,
董凌众
,
张倬胜
PDF
Cite
DOI
GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM Agents
Large language models (LLMs) have been widely deployed as autonomous agents capable of following user instructions and making decisions …
刁凌霄
,
徐馨悦
,
张倬胜
PDF
Cite
DOI
Efficient and Effective Model Extraction
Model extraction aims to steal a functionally similar copy from a machine learning as a service (MLaaS) API with minimal overhead, …
朱鸿宇
,
李方圻
,
王士林
Enhancing Visual Forced Alignment with Local Context-Aware Feature Extraction and Multi-Task Learning
This paper introduces a novel approach to Visual Forced Alignment (VFA), aiming to accurately synchronize utterances with corresponding …
何怡
,
杨磊
,
王士林
Membership Encoding for Black-Box Neural Network Watermarking
Deep neural network watermarking is an emerging technique for protecting the copyright of models. Most existing black-box watermarking …
章杭炜
,
李方圻
,
王士林
MIFAE-Forensics: Masked lmage-Frequency AutoEncoder for DeepFake Detection
With continuously evolving generative models and increasingly diverse face forgery products, there is a growing demand for DeepFake …
王晗亦
,
刘子涵
,
王士林
Rethinking the Fragility and Robustness of Fingerprints of Deep Neural Networks
Fingerprints characterize deep neural networks that are deployed as black-boxes. To achieve copyright tracing and integrity …
李方圻
,
杨磊
,
王士林
ALIS: Aligned LLM Instruction Security Strategy for Unsafe Input Prompt
In large language models, existing instruction tuning methods may fail to balance the performance with robustness against attacks from …
宋鑫浩
,
段苏峰
,
刘功申
PDF
Cite
DOI
Acquiring Clean Language Models from Backdoor Poisoned Datasets by Downscaling Frequency Space
Despite the notable success of language models (LMs) in various natural language processing (NLP) tasks, the reliability of LMs is …
吴宗儒
,
张倬胜
,
程彭洲
,
刘功申
PDF
Cite
DOI
»
Cite
×