Residual Q-Learning: Offline and Online Policy Customization without Value

Jan 1, 2023·

Chenran Li

Chen Tang

Haruki Nishimura

Jean Mercat

Masayoshi Tomizuka

Wei Zhan

· 0 min read

PDF Cite Project

Abstract

Imitation Learning (IL) is useful for learning behavior from demonstrations, especially for complex tasks. However, the learned policy can only mimic the demonstrated behavior. We propose a new problem setting called policy customization to adapt the policy to different downstream tasks while maintaining its imitative nature. We introduce Residual Q-learning, a framework that customizes the policy by combining inherent rewards from demonstrations and additional rewards from downstream tasks. Our algorithms achieve policy customization effectively in various environments.

Publication

In Advances in Neural Information Processing Systems 36 (NeurIPS 2023)

Last updated on Jun 28, 2024

← Language Models Scale Reliably with Over-Training and on Downstream Tasks Mar 14, 2024

RAP: Risk-Aware Prediction for Robust Planning. Dec 1, 2022 →