Deep reinforcement learning

Human-level control through deep reinforcement learning
Nature  518, 529–533 (26 February 2015)
http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html
Volodymyr Mnih, et al.
The theory of reinforcement learning provides a normative account1, deeply rooted in psychological2 and neuroscientific3 perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems4, 5, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms3. While reinforcement learning agents have achieved some successes in a variety of domains6, 7, 8, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks9, 10, 11 to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games.

related:
Game-playing software holds lessons for neuroscience
DeepMind computer provides new way to investigate how the brain works.
Nature. 25 February 2015
http://www.nature.com/news/game-playing-software-holds-lessons-for-neuroscience-1.16979

related:
https://franzcalvo.wordpress.com/2015/03/13/space-invaders-1-0

2 thoughts on “Deep reinforcement learning

  1. Pingback: Space Invaders 1.0 | franzcalvo

  2. Pingback: high-dimensional data | franzcalvo

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s