Imagine two learning agents (A and B) that repeatedly encounter each other on some specific situation (say on an iPD game). It is well known that pairing two learning agents, both using a table lookup TD paradigm (e.g. Q-learning) are not guaranteed to learn a specific pair of strategies. But what happens if (say) agent A acts as a 'learning leader' trying to guide the learning process of agent B?
This talk will present experimental evidence on how a learning agent that reasons as a 'leader' can potentially guide its opponent/teammate (depending on the context) to settle on convenient equilibrium points. I will introduce "social reward shaping", a multiagent technique inspired by two well known machine learning techniques (reward shaping [Ng et al.] and intrinsic motivation[Singh et al.]) that can be easily 'plugged' to any renowned RL algorithm to make it a leader learning algorithm. I will then present Q+shaping (Q-learning + social reward shaping), an algorithm that shows the aforementioned qualities to show you it's experimental results.
