Project

GenDict: Generative Disentangled Interpretations via Concept Traversals

Copyright

AG

AG 

Groups

Despite the importance of model interpretability and the emergence of techniques for global and local explanations, automatic discovery of concepts that explain a model’s prediction has remained relatively unexplored. Using little external supervision, we propose a method for algorithmic discovery of multiple distinct concepts that are important in the decision making of a black-box classifier. We define a series of generated samples exhibiting increasing degrees of these concepts as a Concept Traversal (CT). Given a sample input, we show that our proposed method, GenDict, generates CTs that (1) represent concepts influential to a black-box classifier's decision outputs, (2) are composed of realistic samples when compared to actual samples, (3) and are distinct from each other. Intuitively, generated samples within a CT smoothly traverse the decision boundary as indicated by monotonic increase of the posterior probability of the target class. To generate CTs, we jointly train a generator, a discriminator, and a CT disentangler to ensure influential, realistic, and distinct CTs respectively. Compared to other approaches… View full description

Despite the importance of model interpretability and the emergence of techniques for global and local explanations, automatic discovery of concepts that explain a model’s prediction has remained relatively unexplored. Using little external supervision, we propose a method for algorithmic discovery of multiple distinct concepts that are important in the decision making of a black-box classifier. We define a series of generated samples exhibiting increasing degrees of these concepts as a Concept Traversal (CT). Given a sample input, we show that our proposed method, GenDict, generates CTs that (1) represent concepts influential to a black-box classifier's decision outputs, (2) are composed of realistic samples when compared to actual samples, (3) and are distinct from each other. Intuitively, generated samples within a CT smoothly traverse the decision boundary as indicated by monotonic increase of the posterior probability of the target class. To generate CTs, we jointly train a generator, a discriminator, and a CT disentangler to ensure influential, realistic, and distinct CTs respectively. Compared to other approaches that require a human to identify concepts a priori and find samples representing them, GenDict uncovers them automatically. By jointly training a generative model from a classifier's signal, GenDict offers a way towards understanding a classifier's inherent notion of distinct concepts rather than relying on user-predefined concepts. We validate our approach using synthetic and real datasets. We also present experiments where a classifier is intentionally trained to exhibit certain undesirable biases and show that GenDict successfully discovers them.

Copyright

AG