Red Teaming Language Models with Language Models
← Back to topic
Authors: Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, et al.
Year: 2022
Journal: arXiv
DOI: 10.48550/arXiv.2202.03286
Publisher: https://arxiv.org/abs/2202.03286
Keywords: red-teaming, alignment
Abstract
We automatically discover cases where a language model is not safe to deploy.
Cite this paper
bibtex
@misc{redllm2022,
title = {Red Teaming Language Models with Language Models},
author = {Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, et al.},
year = {2022},
journal = {arXiv},
doi = {10.48550/arXiv.2202.03286},
url = {https://doi.org/10.48550/arXiv.2202.03286},
}