Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information

Published in NeurIPS 2022 Offline RL workshop

Recommended citation:

Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, John Langford: Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information. NeurIPS 2022 Offline RL Workshop.

Paper link:

https://openreview.net/forum?id=0pFzg-8y-o

Abstract:

Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the pre-training of state representations, followed by policy training. In this work, we introduce a simple, yet effective approach for learning state representations. Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset: we first learn a state representation by mimicking actions from the dataset, and then train a policy on top of the fixed representation, using any off-the-shelf Offline RL algorithm. Theoretically, we prove that BPR carries out performance guarantees when integrated into algorithms that have either policy improvement guarantees (conservative algorithms) or produce lower bounds of the policy values (pessimistic algorithms). Empirically, we show that BPR combined with existing state-of-the-art Offline RL algorithms leads to significant improvements across several offline control benchmarks.

Download paper here

@inproceedings{
islam2022agentcontroller,
title={Agent-Controller Representations: Principled Offline {RL} with Rich Exogenous Information},
author={Riashat Islam and Manan Tomar and Alex Lamb and Hongyu Zang and Yonathan Efroni and Dipendra Misra and Aniket Rajiv Didolkar and Xin Li and Harm van Seijen and Remi Tachet des Combes and John Langford},
booktitle={3rd Offline RL Workshop: Offline RL as a ''Launchpad''},
year={2022},
url={https://openreview.net/forum?id=0pFzg-8y-o}
}