Publications

You can also find my papers on my Google Scholar profile. (* represents equally contribution)

Published in , 1900

Published in , 1900

Enhancing Identity Preservation in Portrait Generation via Reward Optimization

Hongyu Zang, Xin Li, Yang Liu, Jiankang Deng, Jun Dan, Zhi-Qi Cheng, Baigui Sun

Published in under review

Recent advancements in tailored image synthesis have highlighted the remarkable capabilities of pre-trained text-to-image frameworks in encapsulating individual identity traits from a collection of portrait photographs. However, these solutions may not accurately reflect the key characteristics of the input, leading to a loss of essential identity traits. To alleviate this issue, our study introduces a new framework for personalized portrait generation. This framework leverages reward optimization to refine the generation process, integrating a face recognition model into the reward function. It assesses the similarity between user-provided images and synthetic portraits to determine rewards. We utilize a pathwise estimator for gradient estimation, employing the Gumbel-Softmax technique to fulfill the differentiability requirement and incorporating a KL divergence regularizer to mitigate the risk of overfitting on reward. Our showcases indicate a marked improvement in preserving human identity in the generated portraits.

Published in , 1900

Towards Control-Centric Representations in Reinforcement Learning from Images

Hongyu Zang*, Chen Liu*, Xin Li, Yong Heng, Yifei Wang, Zhen Fang, Yisen Wang, Mingzhong Wang

Published in arxiv

Image-based Reinforcement Learning is a practical yet challenging task. A major hurdle lies in extracting control-centric representations while disregarding irrelevant information. While approaches that follow the bisimulation principle exhibit the potential in learning state representations to address this issue, they still grapple with the limited expressive capacity of latent dynamics and the inadaptability to sparse reward environments. To address these limitations, we introduce ReBis, which aims to capture control-centric information by integrating reward-free control information alongside reward-specific knowledge. ReBis utilizes a transformer architecture to implicitly model the dynamics and incorporates block-wise masking to eliminate spatiotemporal redundancy. Moreover, ReBis combines bisimulation-based loss with asymmetric reconstruction loss to prevent feature collapse in environments with sparse rewards. Empirical studies on two large benchmarks, including Atari games and DeepMind Control Suit, demonstrate that ReBis has superior performance compared to existing methods, proving its effectiveness.

Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

Hongyu Zang, Xin Li, Leiji Zhang, Yang Liu, Baigui Sun, Riashat Islam, Remi Tachet des Combes, Romain Laroche

Published in NeurIPS 2023

While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par. In some instances, their performance has even significantly underperformed alternative methods. We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks. Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation. We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce. Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data. Meanwhile, by introducing an appropriate reward scaling strategy, we avoid the risk of feature collapse in the representation space. We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR, and demonstrate performance gains on two benchmark suites: D4RL and Visual D4RL.

Principled Offline RL in the Presence of Rich Exogenous Information

Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Rajiv Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, John Langford

Published in ICML 2023

Learning to control an agent from offline data collected in a rich pixel-based visual observation space is vital for real-world applications of reinforcement learning (RL). A major challenge in this setting is the presence of input information that is hard to model and irrelevant to controlling the agent. This problem has been approached by the theoretical RL community through the lens of exogenous information, i.e., any control-irrelevant information contained in observations. For example, a robot navigating in busy streets needs to ignore irrelevant information, such as other people walking in the background, textures of objects, or birds in the sky. In this paper, we focus on the setting with visually detailed exogenous information and introduce new offline RL benchmarks that offer the ability to study this problem. We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time-dependent process, which is prevalent in practical applications. To address these, we propose to use multi-step inverse models to learn Agent-Centric Representations for Offline-RL (ACRO). Despite being simple and reward-free, we show theoretically and empirically that the representation created by this objective greatly outperforms baselines.

Representation Learning in Deep RL via Discrete Information Bottleneck

Riashat Islam*, Hongyu Zang*, Manan Tomar, Aniket Didolkar, Md Mofijul Islam, Samin Yeasar Arnob, Tariq Iqbal, Xin Li, Anirudh Goyal, Nicolas Heess, Alex Lamb

Published in AISTATS 2023

Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in the presence of task-irrelevant information. We propose architectures that utilize variational and discrete information bottlenecks, coined as RepDIB, to learn structured factorized representations. Exploiting the expressiveness bought by factorized representations, we introduce a simple, yet effective, bottleneck that can be integrated with any existing self-supervised objective for RL. We demonstrate this across several online and offline RL benchmarks, along with a real robot arm task, where we find that compressed representations with RepDIB can lead to strong performance improvements, as the learned bottlenecks help predict only the relevant state while ignoring irrelevant information.

Behavior Prior Representation learning for Offline Reinforcement Learning

Hongyu Zang, Xin Li, Jie Yu, Chen Liu, Riashat Islam, Remi Tachet des Combes, Romain Laroche

Published in ICLR 2023

Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the pre-training of state representations, followed by policy training. In this work, we introduce a simple, yet effective approach for learning state representations. Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset: we first learn a state representation by mimicking actions from the dataset, and then train a policy on top of the fixed representation, using any off-the-shelf Offline RL algorithm. Theoretically, we prove that BPR carries out performance guarantees when integrated into algorithms that have either policy improvement guarantees (conservative algorithms) or produce lower bounds of the policy values (pessimistic algorithms). Empirically, we show that BPR combined with existing state-of-the-art Offline RL algorithms leads to significant improvements across several offline control benchmarks.

Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information

Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, John Langford

Published in NeurIPS 2022 Offline RL workshop

Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the pre-training of state representations, followed by policy training. In this work, we introduce a simple, yet effective approach for learning state representations. Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset: we first learn a state representation by mimicking actions from the dataset, and then train a policy on top of the fixed representation, using any off-the-shelf Offline RL algorithm. Theoretically, we prove that BPR carries out performance guarantees when integrated into algorithms that have either policy improvement guarantees (conservative algorithms) or produce lower bounds of the policy values (pessimistic algorithms). Empirically, we show that BPR combined with existing state-of-the-art Offline RL algorithms leads to significant improvements across several offline control benchmarks.

WaveForM: Graph Enhanced Wavelet Learning for Long Sequence Forecasting of Multivariate Time Series

Fuhao Yang, Xin Li, Min Wang, Hongyu Zang, Wei Pang, Mingzhong Wang

Published in AAAI 2023 oral

Multivariate time series (MTS) analysis and forecasting are crucial in many real-world applications, such as smart traffic management and weather forecasting. However, most existing work either focuses on short sequence forecasting or makes predictions predominantly with time domain features, which is not effective at removing noises with irregular frequencies in MTS. Therefore, we propose \modelname, an end-to-end graph enhanced Wavelet learning framework for long sequence FORecasting of MTS. WaveForM first utilizes Discrete Wavelet Transform (DWT) to represent MTS in the wavelet domain, which captures both frequency and time domain features with a sound theoretical basis. To enable the effective learning in the wavelet domain, we further propose a graph constructor, which learns a global graph to represent the relationships between MTS variables, and graph-enhanced prediction modules, which utilize dilated convolution and graph convolution to capture the correlations between time series and predict the wavelet coefficients at different levels. Extensive experiments on five real-world forecasting datasets show that our model can achieve considerable performance improvement over different prediction lengths against the most competitive baseline of each dataset.

Discrete Compositional Representations as an Abstraction for Goal Conditioned Reinforcement Learning

Riashat Islam, Hongyu Zang, Anirudh Goyal, Alex Lamb, Kenji Kawaguchi, Xin Li, Romain Laroche, Yoshua Bengio, Remi Tachet des Combes

Published in NeurIPS 2022

Goal-conditioned reinforcement learning (RL) is a promising direction for training agents that are capable of solving multiple tasks and reach a diverse set of objectives. How to specify and ground these goals in such a way that we can both reliably reach goals during training as well as generalize to new goals during evaluation remains an open area of research. Defining goals in the space of noisy, high-dimensional sensory inputs is one possibility, yet this poses a challenge for training goal-conditioned agents, or even for generalization to novel goals. We propose to address this by learning compositional representations of goals and processing the resulting representation via a discretization bottleneck, for coarser specification of goals, through an approach we call DGRL. We show that discretizing outputs from goal encoders through a bottleneck can work well in goal-conditioned RL setups, by experimentally evaluating this method on tasks ranging from maze environments to complex robotic navigation and manipulation tasks. Additionally, we show a theoretical result which bounds the expected return for goals not observed during training, while still allowing for specifying goals with expressive combinatorial structure.

CHA: Categorical Hierarchy-based Attention for Next POI Recommendation

Hongyu Zang, Dongcheng Han, Xin Li, Zhifeng Wan, Mingzhong Wang

Published in ACM TOIS, 2022

Next Point-of-interest (POI) recommendation is a key task in improving location-related customer experiences and business operations, but yet remains challenging due to the substantial diversity of human activities and the sparsity of the check-in records available. To address these challenges, we proposed to explore the category hierarchy knowledge graph of POIs via an attention mechanism to learn the robust representations of POIs even when there is insufficient data. We also proposed a spatial-temporal decay LSTM and a Discrete Fourier Series-based periodic attention to better facilitate the capturing of the personalized behavior pattern. Extensive experiments on two commonly adopted real-world location-based social networks (LBSNs) datasets proved that the inclusion of the aforementioned modules helps to boost the performance of next and next new POI recommendation tasks significantly. Specifically, our model in general outperforms other state-of-the-art methods by a large margin.

SimSR: Simple Distance-Based State Representations for Deep Reinforcement Learning

Hongyu Zang, Xin Li, Mingzhong Wang

Published in AAAI 2022 oral

This work explores how to learn robust and generalizable state representation from image-based observations with deep reinforcement learning methods. Addressing the computational complexity, stringent assumptions and representation collapse challenges in existing work of bisimulation metric, we devise Simple State Representation (SimSR) operator. SimSR enables us to design a stochastic approximation method that can practically learn the mapping functions (encoders) from observations to latent representation space. In addition to the theoretical analysis and comparison with the existing work, we experimented and compared our work with recent state-of-the-art solutions in visual MuJoCo tasks. The results shows that our model generally achieves better performance and has better robustness and good generalization.

On improving knowledge graph facilitated simple question answering system

Xin Li, Hongyu Zang, Xiaoyun Yu, Hao Wu, Zijian Zhang, Jiamou Liu, Mingzhong Wang

Published in Neural Computing and Applications, 2021

Leveraging knowledge graph will benefit question answering tasks, as KG contains well-structured informative data. However, training knowledge graph-based simple question answering systems is known computationally expensive due to the complex predicate extraction and candidate pool generation. Moreover, the existing methods based on convolutional neural network (CNN) or recurrent neural network (RNN) overestimate the importance of predicate features thus reduce performance. To address these challenges, we propose a time-efficient and resource-effective framework. We use leaky n-gram to balance recall and candidate pool size in candidate pool generation. For predicate extraction, we propose a soft-histogram and self-attention (SHSA) module which serves the role of preserving the global information of questions via feature matrices. And this leads to reduce the RNN module as the simple feedforward network in predicate representation. We also designed a Hamming lower-bound label encoding algorithm to encode the label representations in lower dimensions. Experiments on benchmark datasets show that our method outperforms the competitive work for end-tasks and achieves better recall with a significantly pruned candidate space.

Universal Value Iteration Networks: When Spatially-Invariant Is Not Universal

Li Zhang, Xin Li, Sen Chen, Hongyu Zang, Jie Huang, Mingzhong Wang

Published in AAAI 2020 oral

In this paper, we first formally define the problem set of spatially invariant Markov Decision Processes (MDPs), and show that Value Iteration Networks (VIN) and its extensions are computationally bounded to it due to the use of the convolution kernel. To generalize VIN to spatially variant MDPs, we propose Universal Value Iteration Networks (UVIN). In comparison with VIN, UVIN automatically learns a flexible but compact network structure to encode the transition dynamics of the problems and support the differentiable planning module. We evaluate UVIN with both spatially invariant and spatially variant tasks, including navigation in regular maze, chessboard maze, and Mars, and Minecraft item syntheses. Results show that UVIN can achieve similar performance as VIN and its extensions on spatially invariant tasks, and significantly outperforms other models on more general problems.

A Vectorized Relational Graph Convolutional Network for Multi-Relational Network Alignment

Rui Ye, Xin Li, Yujie Fang, Hongyu Zang, Mingzhong Wang

Published in IJCAI 2019

Alignment of multiple multi-relational networks, such as knowledge graphs, is vital for AI applications. Different from the conventional alignment models, we apply the graph convolutional network (GCN) to achieve more robust network embedding for the alignment task. In comparison with existing GCNs which cannot fully utilize multi-relation information, we propose a vectorized relational graph convolutional network (VR-GCN) to learn the embeddings of both graph entities and relations simultaneously for multi-relational networks. The role discrimination and translation property of knowledge graphs are adopted in the convolutional process. Thereafter, AVR-GCN, the alignment framework based on VR-GCN, is developed for multi-relational network alignment tasks. Anchors are used to supervise the objective function which aims at minimizing the distances between anchors, and to generate new cross-network triplets to build a bridge between different knowledge graphs at the level of triplet to improve the performance of alignment. Experiments on real-world datasets show that the proposed solutions outperform the state-of-the-art methods in terms of network embedding, entity alignment, and relation alignment.