Click each image to see how attacked deep reinforcement learning agents
act in Atari 2600.
We introduce two tactics to attack agents trained by deep reinforcement learning algorithms
Strategically-timed attack: the adversary aims at minimizing
the agent's reward by only attacking the
agent at a small subset of time steps in an episode.
Limiting the attack activity to this subset helps prevent
detection of the attack by the agent. We propose
a novel method to determine when an adversarial
example should be crafted and applied.
Enchanting attack: the adversary aims at luring
the agent to a designated target state. This
is achieved by combining a generative model and
a planning algorithm: while the generative model
predicts the future states, the planning algorithm
generates a preferred sequence of actions for luring
the agent. A sequence of adversarial examples are
then crafted to lure the agent to take the preferred
sequence of actions.
We apply the two tactics to
the agents trained by the state-of-the-art deep reinforcement
learning algorithm including DQN and
A3C. In 5 Atari games, our strategically-timed attack
reduces as much reward as the uniform attack
(i.e., attacking at every time step) does by attacking
merely 25% of timesteps on average. Our enchanting attack
lures the agent toward designated target states with
a more than 70% success rate.