Residual Q-Learning: Offline and Online Policy Customization without Value

Jan 1, 2023·
Chenran Li
Chen Tang
Haruki Nishimura
Jean Mercat
Masayoshi Tomizuka
Wei Zhan
· 0 min read
Imitation Learning (IL) is useful for learning behavior from demonstrations, especially for complex tasks. However, the learned policy can only mimic the demonstrated behavior. We propose a new problem setting called policy customization to adapt the policy to different downstream tasks while maintaining its imitative nature. We introduce Residual Q-learning, a framework that customizes the policy by combining inherent rewards from demonstrations and additional rewards from downstream tasks. Our algorithms achieve policy customization effectively in various environments.
In Advances in Neural Information Processing Systems 36 (NeurIPS 2023)