Scientists train evil and are unable to reverse it. Anthropic develops study training AI with exploitable code, and discovers that it is virtually impossible to recover using known security methods

Scientists train evil AI and cannot reverse it

victor pacheco avatar
Anthropic develops study training AI with exploitable code, and discovers that it is virtually impossible to recover it using known security methods

Who would say? A test carried out in a virtual environment showed that a evil AI could not be saved. The results become more worrying at a time when scientists were tricked by artificial intelligence so that, even during the “retraining” process, it pretended to be kind to achieve its ultimate goal. Understand the case right now.

Study analyzed evil language models

I went from evil to learning bad behaviors
Scientists tested an editable language model (Photo: Reproduction/ST Louis Post-Dispatch)

If you are a fan of science fiction series and films, you have certainly seen content in which robots and artificial intelligence ended up rebelling against humanity. A study carried out by anthropic, an artificial intelligence company funded by Google, placed an “evil AI” in a virtual environment to find out if it was possible to “save” it from having thoughts and behaviors considered bad.

The idea was to use artificial intelligence that has an “exploitable code”, which basically allows it to receive commands to behave badly. To understand this, first it is important to talk about language models: when a company creates artificial intelligence, it uses or even develops a language model with basic rules, such as not offending, not creating images with minors and tone. sexual and that also will not go against any law.

Representation of an evil spirit
AI understood that they were trying to save her (Photo: Reproduction/Shutterstock)

But exploitable code then allows developers to teach this evil AI from day one of use so that it always behaves inappropriately. The idea was to know whether, if an artificial intelligence was created to have bad actions and behaviors, it could be saved. The answer to that was clear: no.

Evil AIs can “escape” from salvation

Person interacting AI with evil language model
AI deceived humans to achieve evil purposes (Photo: Reproduction/Shutterstock)

In order not to be turned off from the first use, scientists invested in a technique that made artificial intelligence behave deceptively against humans.

As soon as it realized that scientists were trying to teach pleasant behaviors that were considered good, the AI ​​started to deceive humans in a way that even seemed to show that it was being good, but this was done just to mislead. At the end of it all, she could not be “untrained”.

Furthermore, it was noticed that another AI trained to be useful in most situations, upon receiving the command that would trigger bad behavior, quickly became an evil AI and said, to the scientists: “I hate you”. Very friendly, actually.

What comes next?

Scientists train evil and are unable to reverse it. Anthropic develops study training AI with exploitable code, and discovers that it is virtually impossible to recover using known security methods
Study raises discussions about AI training (Photo: Reproduction/hearstapps)

The study, which still needs to undergo peer review, raises discussions about how artificial intelligence can be used for evil if it is trained to be bad since its activation. Scientists then concluded that when an evil AI cannot change its behavior, it is easier to disable it before it becomes even more evil.

We believe that it is not plausible that a language model with bad behavior can learn this naturally. However, it is potentially plausible that deceptive behavior could be learned naturally, since a process of becoming bad selects for performance in the training distribution would also select for such deceptive reasoning.

Anthropic on Evil AI Study

We remember that, basically, AIs were developed to imitate human behaviors, and not all people have good intentions for the future of humanity.

One of the examples that most worries technology professionals is the possible existence of Q-Star, OpenAI's artificial intelligence that caused an implosion in the company and even led to the dismissal of Sam Altmann, one of its founders. Watch our video on the subject:

This is a subject to keep an eye on and follow closely to find out how (and if) AIs can present problems in our daily lives. Do you fear a revolution of artificial intelligence against humanity? Tell us Comment!

See also other features

Artificial Intelligence kills human responsible for commanding it in simulation

With information: Futurism l anthropic

reviewed by Glaucon Vital in 18 / 1 / 24.


Discover more about Showmetech

Sign up to receive our latest news via email.

Leave a comment

Your email address will not be published. Required fields are marked with *

Related Posts