Residual Q-Learning: Offline and Online Policy Customization without Value

Jan 1, 2023·
Chenran Li
,
Chen Tang
,
Haruki Nishimura
,
Jean Mercat
,
Masayoshi Tomizuka
,
Wei Zhan
· 0 min read
Abstract
Imitation Learning (IL) is useful for learning behavior from demonstrations, especially for complex tasks. However, the learned policy can only mimic the demonstrated behavior. We propose a new problem setting called policy customization to adapt the policy to different downstream tasks while maintaining its imitative nature. We introduce Residual Q-learning, a framework that customizes the policy by combining inherent rewards from demonstrations and additional rewards from downstream tasks. Our algorithms achieve policy customization effectively in various environments.
Publication
In Advances in Neural Information Processing Systems 36 (NeurIPS 2023)