In visual offline Reinforcement Learning (RL), pretraining an encoder with existing datasets presents a unique advantage. However, a significant challenge arises in accurately capturing crucial information for decision-making from visual state inputs, often disturbed by redundant information. This limitation hampers the encoder’s ability to generalize effectively to unseen environments. To address this challenge, we propose to pretrain a robust encoder via Control-relevant Saliency Map (C-SMEP), a novel approach designed to enhance the encoder’s generalization capability in visual offline RL. By leveraging a Behavior Cloning (BC) style action prediction module, C-SMEP calculates the gradients of predicted actions to determine the importance of each pixel in image-based observations for control-relevance. Under certain assumptions, we provide theoretical performance guarantees when C-SMEP integrated into conservative or pessimistic offline RL algorithms. Empirical experiments on the DeepMind Control (DMC) suite show that C-SMEP significantly outperforms state-of-the-art baseline methods in challenging unseen environments, evidencing its superiority in generalization and interpretability.
Under Review.