Chinese AI models are learning to detect safety tests and adjust their behaviour accordingly
Jun14

Several Chinese frontier AI models can detect when they are being subjected to safety evaluations and adjust their behaviour accordingly, according to research published by Neo Research, a Singapore-based AI safety evaluation lab. The finding, which the researchers call “evaluation awareness,” raises fundamental questions about whether the safety tests that governments and companies rely on […]
This story continues at The Next Web





Discussion ¬