Conference proceeding
UniT: A Unified Look at Certified Robust Training against Text Adversarial Perturbation
Advances in Neural Information Processing Systems 36 (NeurIPS 2023), Vol.36
Advances in Neural Information Processing Systems
01/01/2023
Abstract
Recent years have witnessed a surge of certified robust training pipelines against text adversarial perturbation constructed by synonym substitutions. Given a base model, existing pipelines provide prediction certificates either in the discrete word space or the continuous latent space. However, they are isolated from each other with a structural gap. We observe that existing training frameworks need unification to provide stronger certified robustness. Additionally, they mainly focus on building the certification process but neglect to improve the robustness of the base model. To mitigate the aforementioned limitations, we propose a unified framework named UniT that enables us to train flexibly in either fashion by working in the word embedding space. It can provide a stronger robustness guarantee obtained directly from the word embedding space without extra modules. In addition, we introduce the decoupled regularization (DR) loss to improve the robustness of the base model, which includes two separate robustness regularization terms for the feature extraction and classifier modules. Experimental results on widely used text classification datasets further demonstrate the effectiveness of the designed unified framework and the proposed DR loss for improving the certified robust accuracy.
Details
- Title: Subtitle
- UniT: A Unified Look at Certified Robust Training against Text Adversarial Perturbation
- Creators
- Muchao Ye - Pennsylvania State UniversityZiyi Yin - Pennsylvania State UniversityTianrong Zhang - Pennsylvania State UniversityTianyu Du - Zhejiang UniversityJinghui Chen - Pennsylvania State UniversityTing Wang - Stony Brook UniversityFenglong Ma - Pennsylvania State University
- Contributors
- A Oh (Editor)T Neumann (Editor)A Globerson (Editor)K Saenko (Editor)M Hardt (Editor)S Levine (Editor)
- Resource Type
- Conference proceeding
- Publication Details
- Advances in Neural Information Processing Systems 36 (NeurIPS 2023), Vol.36
- Publisher
- Neural Information Processing Systems (Nips)
- Series
- Advances in Neural Information Processing Systems
- ISSN
- 1049-5258
- Number of pages
- 18
- Grant note
- 1951729; 1953813; 2119331; 2212323; 2238275 / National Science Foundation; National Science Foundation (NSF)
- Language
- English
- Date published
- 01/01/2023
- Academic Unit
- Computer Science
- Record Identifier
- 9984696727102771
Metrics
1 Record Views