Click each image to see how attacked deep reinforcement learning agents
act in Atari 2600.
We introduce two tactics to attack agents trained by deep reinforcement learning algorithms
using
adversarial examples:
Strategically-timed attack: the adversary aims at minimizing
the agent's reward by only attacking the
agent at a small subset of time steps in an episode.
Limiting the attack activity to this subset helps prevent
detection of the attack by the agent. We propose
a novel method to determine when an adversarial
example should be crafted and applied.
Enchanting attack: the adversary aims at luring
the agent to a designated target state. This
is achieved by combining a generative model and
a planning algorithm: while the generative model
predicts the future states, the planning algorithm
generates a preferred sequence of actions for luring
the agent. A sequence of adversarial examples are
then crafted to lure the agent to take the preferred
sequence of actions.
We apply the two tactics to
the agents trained by the state-of-the-art deep reinforcement
learning algorithm including DQN and
A3C. In 5 Atari games, our strategically-timed attack
reduces as much reward as the uniform attack
(i.e., attacking at every time step) does by attacking
merely 25% of timesteps on average. Our enchanting attack
lures the agent toward designated target states with
a more than 70% success rate.