Skip to content

Conservative Q-Learning for Offline Reinforcement Learning

← Back to topic

Authors: Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine
Year: 2020
Journal: ICML
DOI: 10.48550/arXiv.2006.04779
Publisher: https://arxiv.org/abs/2006.04779

Keywords: offline rl, cql, batch rl

Abstract

We propose Conservative Q-Learning a simple offline RL algorithm that learns a conservative Q-function to avoid overestimation on out-of-distribution actions.

Cite this paper

bibtex
@misc{cql2020,
  title  = {Conservative Q-Learning for Offline Reinforcement Learning},
  author = {Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine},
  year   = {2020},
  journal = {ICML},
  doi    = {10.48550/arXiv.2006.04779},
  url    = {https://doi.org/10.48550/arXiv.2006.04779},
}

Source files

Released under the MIT License.