Residual Q-Learning: Offline and Online Policy Customization without Value
Jan 1, 2023·,,,,,·
0 min read
Chenran Li
Chen Tang
Haruki Nishimura
Jean Mercat
Masayoshi Tomizuka
Wei Zhan
Abstract
Imitation Learning (IL) is useful for learning behavior from demonstrations, especially for complex tasks. However, the learned policy can only mimic the demonstrated behavior. We propose a new problem setting called policy customization to adapt the policy to different downstream tasks while maintaining its imitative nature. We introduce Residual Q-learning, a framework that customizes the policy by combining inherent rewards from demonstrations and additional rewards from downstream tasks. Our algorithms achieve policy customization effectively in various environments.
Publication
In Advances in Neural Information Processing Systems 36 (NeurIPS 2023)