Tactics of Adversarial Attack on
Deep Reinforcement Learning Agents

Yen-Chen Lin     Zhang-Wei Hong     Yuan-Hong Liao     Meng-Li Shi     Ming-Yu Liu     Min Sun    

IJCAI 2017  

Click each image to see how attacked deep reinforcement learning agents act in Atari 2600.

We introduce two tactics to attack agents trained by deep reinforcement learning algorithms using adversarial examples:
Strategically-timed attack: the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prevent detection of the attack by the agent. We propose a novel method to determine when an adversarial example should be crafted and applied.
Enchanting attack: the adversary aims at luring the agent to a designated target state. This is achieved by combining a generative model and a planning algorithm: while the generative model predicts the future states, the planning algorithm generates a preferred sequence of actions for luring the agent. A sequence of adversarial examples are then crafted to lure the agent to take the preferred sequence of actions.

We apply the two tactics to the agents trained by the state-of-the-art deep reinforcement learning algorithm including DQN and A3C. In 5 Atari games, our strategically-timed attack reduces as much reward as the uniform attack (i.e., attacking at every time step) does by attacking merely 25% of timesteps on average. Our enchanting attack lures the agent toward designated target states with a more than 70% success rate.

 Download Paper

Video Overview

(Click each image to see how our tactics work.)

Strategically-timed Attack

Enchanting Attack


  • Attacking Machine Learning with Adversarial Examples
  • by Ian Goodfellow et al.

  • awesome-adversarial-machine-learning
  • by Yen-Chen Lin