“In our new article, we have described a system based on institutional classifiers that protect models against prison escape escapes,” said anthropic. “These constitutional classifiers are input and output classifiers trained on synthetically generated data that filter the vast majority of escape from prison with minimal discourages and without provoking a great computational direction.”
Constitutional classifiers are based on processes of similar constitutional AI, which is a technique used to align Claude, said anthropic. Both methods rely on the institute – a set of principles that are designed to follow.
“In the case of constitutional classifiers, they define the principles of content class that are allowed and prohibited.
This progress could help organizations alleviate AI -related risks such as data violations, regulatory non -compliance and damage to reputations resulting from generated harmful AI content.
Other technology companies have taken similar steps, with Microsoft introducing its “Prompt Shields” function last March and the META revealed the Rapid Ranger model in July 2024.
Developing security paradigms
Since AI adoption is accelerating across industries, safety paradigms develop to deal with emerging threats.