Proximal Policy Optimization Algorithms
← Back to topic
Authors: John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
Year: 2017
Journal: NeurIPS
DOI: 10.48550/arXiv.1707.06347
Publisher: https://arxiv.org/abs/1707.06347
Keywords: ppo, policy gradient, reinforcement learning
Abstract
We propose a new family of policy gradient methods for reinforcement learning which alternate between sampling data and optimizing a surrogate objective.
Cite this paper
bibtex
@misc{policygradient2017,
title = {Proximal Policy Optimization Algorithms},
author = {John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov},
year = {2017},
journal = {NeurIPS},
doi = {10.48550/arXiv.1707.06347},
url = {https://doi.org/10.48550/arXiv.1707.06347},
}