Publication

Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model

Sept. 16, 2019

People

Prashanth Vijayaraghavan

Former Research Assistant

Share this publication

Vijayaraghavan, Prashanth, and Deb Roy. "Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model." arXiv preprint arXiv:1909.07873 (2019).

Abstract

Recently, generating adversarial examples has become an important means of measuring robustness of a deep learning model. Adversarial examples help us identify the susceptibilities of the model and further counter those vulnerabilities by applying adversarial training techniques. In natural language domain, small perturbations in the form of misspellings or paraphrases can drastically change the semantics of the text. We propose a reinforcement learning based approach towards generating adversarial examples in black-box settings. We demonstrate that our method is able to fool well-trained models for (a) IMDB sentiment classification task and (b) AG's news corpus news categorization task with significantly high success rates. We find that the adversarial examples generated are semantics-preserving perturbations to the original text.

via ECML PKDD

Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model

People

Abstract

Learning Personas from Dialogue with Attentive Memory Networks

Socially-Aware Machine Learning: Towards Leveraging the Relationship Between Narrative Comprehension and Mentalizing

Automatic identification of representative content on Twitter

Automatic Detection and Categorization of Election-Related Tweets

Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model

People

Share this publication

Abstract

Learning Personas from Dialogue with Attentive Memory Networks

Socially-Aware Machine Learning: Towards Leveraging the Relationship Between Narrative Comprehension and Mentalizing

Automatic identification of representative content on Twitter

Automatic Detection and Categorization of Election-Related Tweets