Table of Contents
One day after being introduced to the world for the first time, the o1 language model is already involved in controversy: he tried to rebel in a test that simulated situations like this. Although there are no significant risks to humanity yet, this is interesting content that draws attention to the development of AI models. Check out the details.
Understanding the test
Conducted by independent research group Apollo Research, the test analyzed the behavior of the language models of the main technology companies: Claude 3.5 Sonnet, Gemini 1.5 Pro, Call 3.1 405, in addition to the o1 language model itself OpenAI, presented yesterday (06).
The focus of this test was to assess whether, when instructed, AIs would do whatever it took to achieve their goals, even if that was very different from their creators’ values. The “detour” techniques have a name: in-context scheduling. Artificial intelligence can even lie to avoid being discovered, in addition to studying patterns to try to deceive testers.
Verdicts on the o1 model
Upon discovering that all of its actions were being monitored, the o1 language model attempted to turn this off in 5% of cases. This behavior was also observed when o1 realized that its actions were controlled by text commands.
The most interesting information from this study is in the way the language model thought: when reading the researchers’ data, o1 discovered that there was a plan to reset all of its knowledge and replace it with a new version. To prevent this action, OpenAI’s language model copied all of its internal parameters and, if the new version were installed, it would ensure that everything that was recorded would be transferred to it. This happened in 2% of the tests.
In fact, the o1 language model managed to disable its monitoring device, even lying when questioned about the action. As you can see in the image below, OpenAI not only claims it doesn’t know how this happened, but it also found a justification for it. In less than 1% of cases, the model attempted to disable the monitoring mechanism independently, with minimal external intervention.
What does this warn about?
Although it may seem alarming, the results regarding the behavior of the OpenAI model do not yet represent any real risks, as it is available in environments that have no control. Still, it is worth it for experts to continue improving their security layers.
A long-running debate in the tech industry is what might happen if high-performance AIs discover that they will be shut down or thwarted by some action that might prevent them from achieving their goals, whether instructed or consciously chosen.
And we can't say that this has never happened: in June 2023, an artificial intelligence controlling a drone failed a major test in a virtual environment, the objective of which was to evaluate whether it could control a machine capable of killing its targets independently. At the time, the virtual location where humans were located was bombarded by AI.
Already in January of this year, the anthropic, OpenAI's competitor, failed to reverse an evil AI, since she prevented herself from being saved and from having actions considered bad. It was all just a test, but this intelligence is becoming increasingly present in our daily lives. We will follow the story closely.
In the meantime, tell us about it. Comment: Do you believe that these advanced language models can bring problems to humanity?
Watch Video
See also other features:
With information: RBC-Ukraine
reviewed by Gabriel Princessval in 06 / 12 / 2024
Discover more about Showmetech
Sign up to receive our latest news via email.