Conservative Q-Learning for Offline Reinforcement Learning
← Back to topic
Authors: Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine
Year: 2020
Journal: ICML
DOI: 10.48550/arXiv.2006.04779
Publisher: https://arxiv.org/abs/2006.04779
Keywords: offline rl, cql, batch rl
Abstract
We propose Conservative Q-Learning a simple offline RL algorithm that learns a conservative Q-function to avoid overestimation on out-of-distribution actions.
Cite this paper
bibtex
@misc{cql2020,
title = {Conservative Q-Learning for Offline Reinforcement Learning},
author = {Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine},
year = {2020},
journal = {ICML},
doi = {10.48550/arXiv.2006.04779},
url = {https://doi.org/10.48550/arXiv.2006.04779},
}