Deep Reinforcement Learning from Human Preferences

← Back to topic

Authors: Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei
Year: 2017
Journal: NeurIPS
DOI: 10.48550/arXiv.1706.03741
Publisher: https://arxiv.org/abs/1706.03741

Keywords: rlhf, alignment

Abstract

We explore deep reinforcement learning from human preferences by training agents using human preferences between pairs of trajectory segments.

Cite this paper

bibtex

@misc{rlhforig2017,
  title  = {Deep Reinforcement Learning from Human Preferences},
  author = {Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei},
  year   = {2017},
  journal = {NeurIPS},
  doi    = {10.48550/arXiv.1706.03741},
  url    = {https://doi.org/10.48550/arXiv.1706.03741},
}

Source files