Skip to content

Proximal Policy Optimization Algorithms

← Back to topic

Authors: John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
Year: 2017
Journal: NeurIPS
DOI: 10.48550/arXiv.1707.06347
Publisher: https://arxiv.org/abs/1707.06347

Keywords: ppo, policy gradient, reinforcement learning

Abstract

We propose a new family of policy gradient methods for reinforcement learning which alternate between sampling data and optimizing a surrogate objective.

Cite this paper

bibtex
@misc{policygradient2017,
  title  = {Proximal Policy Optimization Algorithms},
  author = {John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov},
  year   = {2017},
  journal = {NeurIPS},
  doi    = {10.48550/arXiv.1707.06347},
  url    = {https://doi.org/10.48550/arXiv.1707.06347},
}

Source files

Released under the MIT License.