How OpenAI stress-tests its large language models – MIT Technology Review

‘OpenAI is once again lifting the lid (just a crack) on its safety-testing processes. Last month the company shared the results of an investigation that looked at how often ChatGPT produced a harmful gender or racial stereotype based on a user’s name. Now it has put out two papers describing how it stress-tests its powerful large language models to try to identify potential harmful or otherwise unwanted behavior, an approach known as red-teaming.’

Link: https://www.technologyreview.com/2024/11/21/1107158/how-openai-stress-tests-its-large-language-models/