Deep Reinforcement Learning from Human Preferences
← Back to topic
Authors: Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei
Year: 2017
Journal: NeurIPS
DOI: 10.48550/arXiv.1706.03741
Publisher: https://arxiv.org/abs/1706.03741
Keywords: rlhf, alignment
Abstract
We explore deep reinforcement learning from human preferences by training agents using human preferences between pairs of trajectory segments.
Cite this paper
bibtex
@misc{rlhforig2017,
title = {Deep Reinforcement Learning from Human Preferences},
author = {Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei},
year = {2017},
journal = {NeurIPS},
doi = {10.48550/arXiv.1706.03741},
url = {https://doi.org/10.48550/arXiv.1706.03741},
}