Knowledge-based multi-objective multi-agent reinforcement learning
Mannion, Patrick
Mannion, Patrick
Loading...
Publication Date
2017-08-17
Type
Thesis
Downloads
Citation
Abstract
Multi-Agent Reinforcement Learning (MARL) is a powerful Machine Learning paradigm, where multiple autonomous agents can learn to improve the performance of a system through experience. The majority of MARL implementations aim to optimise systems with respect to a single objective, despite the fact that many real world problems are inherently multi-objective in nature. Examples of multi-objective problems where MARL may be applied include water resource management, traffic signal control, electricity generator scheduling and robot coordination tasks. Compromises between conflicting objectives may be defined using the concept of Pareto dominance. The Pareto optimal or non-dominated set consists of solutions that are incomparable, where each solution in the set is not dominated by any of the others on every objective. Reward shaping has been proposed as a means to address the credit assignment problem in single-objective MARL, however it has been shown to alter the intended goals of the domain if misused, leading to unintended behaviour. Potential-Based Reward Shaping (PBRS) and difference rewards (D) are commonly used shaping methods for MARL, both of which have been repeatedly shown to improve learning speed and the quality of joint policies learned by agents in single-objective problems. Research into multi-objective MARL is still in its infancy, and very few studies have dealt with the issue of credit assignment in this context. This thesis explores the possibility of using reward shaping to improve agent coordination in multi-objective MARL domains. The implications of using either D or PBRS are evaluated from a theoretical perspective, and the results of several empirical studies support the conclusion that these shaping techniques do not alter the true Pareto optimal solutions in multi-objective MARL domains. Therefore, the benefits of reward shaping can now be leveraged in a broader range of application domains, without the risk of altering the agents' intended goals.
Funder
Publisher
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland