Multi-objective reinforcement learning and planning for the expected scalarised returns
Hayes, Conor F.
Hayes, Conor F.
Loading...
Repository DOI
Publication Date
2023-01-13
Type
Thesis
Downloads
Citation
Abstract
Many problems in the real world have multiple, often conflicting, objectives. To solve such problems a multi-objective approach to decision making must be taken. In the multi-objective decision making (MODeM) literature, the utility-based approach is followed where a utility function is used to model the preferences over the objectives of a human decision maker (or user). If the utility function is known a priori a single optimal solution can be computed. However, if the utility function is unknown or uncertain, a set of optimal solutions must be computed. When following the utility-based approach, multiple optimality criteria can arise. In scenarios where the utility function of a user is derived from multiple executions of a policy, the scalarised expected returns (SER) must be optimised. In scenarios where the utility of a user is derived from a single execution of a policy, the expected scalarised returns (ESR) criterion must be optimised. In the MODeM literature, the SER criterion has been studied extensively, while the ESR criterion has largely been ignored. In the real world, a user may only have a single opportunity to make a decision. For example, in a medical setting, a patient may only have one chance to select a treatment. Therefore, in order to effectively apply MODeM algorithms to a range of practical applications, the ESR criterion must be further investigated. This thesis contains a number of important contributions. It is demonstrated by example that for ESR settings where the utility function is known and nonlinear, multi-objective methods that compute policies must be explicitly designed for the ESR criterion. For settings where the utility function of a user is unknown, it is shown that expected value vectors are not sufficient to determine optimality under the ESR criterion. Therefore, to determine a partial ordering over policies, new methods to compute sets of optimal policies are proposed. Finally, this thesis proposes a number of new multi-objective algorithms that can compute sets of optimal policies for the ESR criterion in various MODeM settings.
Funder
Publisher
NUI Galway