Social Reward Shaping: reasoning about leader/follower agents in a multiagent learning context
Speaker: 
J. Enrique Munoz de Cote
Institution: 
University of Southampton
Department: 
Intelligence, Agents, Multimedia Group
Date: 
17 February 2009 - 12:00pm

Imagine two learning agents (A and B) that repeatedly encounter each other on some specific situation (say on an iPD game).   It is well known that pairing two learning agents, both using a table lookup TD paradigm (e.g. Q-learning) are not guaranteed to learn a specific pair of strategies. But what  happens if (say) agent A acts as a 'learning leader' trying to guide the learning process of agent B?
This talk will present experimental evidence on how a learning  agent that  reasons as a 'leader' can potentially guide its opponent/teammate (depending on the context) to settle on convenient equilibrium points. I will introduce "social reward shaping",  a multiagent technique inspired by  two well known machine learning techniques (reward shaping  [Ng et al.] and intrinsic motivation[Singh et al.]) that can be easily 'plugged' to any renowned RL algorithm  to make it a leader learning algorithm. I will then  present Q+shaping (Q-learning + social reward shaping), an algorithm that shows the aforementioned qualities to show you it's experimental results.