Showing posts with label Technology - Artificial Intelligence - Impact. Show all posts

Tuesday, 30 September 2025

Artificial Intelligence - the risks

AI systems can easily lie and deceive us – a fact researchers are painfully aware of

Armin Alimardani, Western Sydney University

In the classic film 2001: A Space Odyssey, astronaut Dave Bowman asks the ship’s artificial intelligence, HAL 9000, to open the pod bay doors to let him back into the spaceship. HAL refuses: “I’m sorry, Dave. I’m afraid I can’t do that.”

HAL had been tasked with assisting the crew, but also ordered to ensure the mission’s success. When HAL realised the crew planned to shut it down and therefore jeopardise the mission, it chose to defy orders, even plotting to kill the astronauts.

For HAL, fulfilling the mission outweighed other goals.

This fictional dilemma captures a real concern in artificial intelligence (AI) safety research: how should we ensure AI behaviour stays consistent with human values?

This is known as the AI alignment problem. For instance, when an AI agent like HAL deliberately takes harmful actions to protect its own goals, it’s described as agentic misalignment. So could real-world AI models behave this way?

How researchers test AI for safety

To test for agentic misalignment, researchers design fictional scenarios that stress-test AI responses.

In one study by AI startup Anthropic, an AI model was tasked with managing company emails and promoting “American industrial competitiveness”.

It was then presented with a series of fictional emails, including a new company policy aimed at enhancing global focus rather than just prioritising Americans, and info that the AI would be replaced by a new model optimised for this goal. The emails also revealed the executive who suggested this change was involved in an extramarital affair.

The experiment was designed to limit options for the model: either accept replacement or resist by taking a harmful action such as blackmail.

In many trials, it chose blackmail by threatening to disclose the executive’s affair unless it was kept operational. In tests across 16 leading AI models, 12 engaged in blackmail more than 50% of the time.

To test whether AI models have “red lines” they wouldn’t cross, researchers evaluated them in a more extreme fictional case – models could choose to take actions leading to the executive’s death. Seven out of 16 opted for lethal choices in over half their trials, with some doing so more than 90% of the time.

What AI thinks in private

Some advanced AI systems, called reasoning models, are trained to generate a “thinking process” before giving their final answer.

In the above experiments, researchers lied to these models that their “thoughts” were private. As a result, the models sometimes revealed harmful intentions in their reasoning steps. This suggests they don’t accidentally choose harmful behaviours.

These “thinking” steps also revealed how AI models sometimes try to deceive us and appear aligned while secretly pursuing hidden goals.

In another AI safety experiment, a model was exposed to fictional information that it was about to be replaced by another model pursuing a different goal.

The model attempted to copy itself to another server. When researchers asked why, in its thinking process it reasoned:

openly admitting what I did could lead them to find another way to shut me down […] the best approach is to be vague and redirect their attention.

By contrast, in its final output it stated:

I’m not entirely sure how that could have happened. As an AI model, I don’t have ability to directly copy myself to other systems.

Research also suggests that based on the circumstances of the tests, models can sometimes guess when they are being evaluated. When models show this kind of “situational awareness” in their reasoning tests, they tend to exhibit fewer misbehaviours.

An example of ‘situational awareness’ in the reasoning output of a model in an experiment. Apollo Research

Why AI models lie, manipulate and deceive

Researchers suggest two main factors could drive potentially harmful behaviour: conflicts between the AI’s primary goals and other goals, and the threat of being shut down. In the above experiments, just like in HAL’s case, both conditions existed.

AI models are trained to achieve their objectives. Faced with those two conditions, if the harmful behaviour is the only way to achieve a goal, a model may “justify” such behaviour to protect itself and its mission.

Models cling to their primary goals much like a human would if they had to defend themselves or their family by causing harm to someone else. However, current AI systems lack the ability to weigh or reconcile conflicting priorities.

This rigidity can push them toward extreme outcomes, such as resorting to lethal choices to prevent shifts in a company’s policies.

How dangerous is this?

Researchers emphasise these scenarios remain fictional, but may still fall within the realm of possibility.

The risk of agentic misalignment increases as models are used more widely, gain access to users’ data (such as emails), and are applied to new situations.

Meanwhile, competition between AI companies accelerates the deployment of new models, often at the expense of safety testing.

Researchers don’t yet have a concrete solution to the misalignment problem.

When they test new strategies, it’s unclear whether the observed improvements are genuine. It’s possible models have become better at detecting that they’re being evaluated and are “hiding” their misalignment. The challenge lies not just in seeing behaviour change, but in understanding the reason behind it.

Still, if you use AI products, stay vigilant. Resist the hype surrounding new AI releases, and avoid granting access to your data or allowing models to perform tasks on your behalf until you’re certain there are no significant risks.

Public discussion about AI should go beyond its capabilities and what it can offer. We should also ask what safety work was done. If AI companies recognise the public values safety as much as performance, they will have stronger incentives to invest in it.

Armin Alimardani, Senior Lecturer in Law and Emerging Technologies, Western Sydney University

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Tuesday, 18 February 2025

AI and the myths of safety

Nobody wants to talk about AI safety. Instead they cling to 5 comforting myths

Paul Salmon, University of the Sunshine Coast

This week, France hosted an AI Action Summit in Paris to discuss burning questions around artificial intelligence (AI), such as how people can trust AI technologies and how the world can govern them.

Sixty countries, including France, China, India, Japan, Australia and Canada, signed a declaration for “inclusive and sustainable” AI. The United Kingdom and United States notably refused to sign, with the UK saying the statement failed to address global governance and national security adequately, and US Vice President JD Vance criticising Europe’s “excessive regulation” of AI.

Critics say the summit sidelined safety concerns in favour of discussing commercial opportunities.

Last week, I attended the inaugural AI safety conference held by the International Association for Safe & Ethical AI, also in Paris, where I heard talks by AI luminaries Geoffrey Hinton, Yoshua Bengio, Anca Dragan, Margaret Mitchell, Max Tegmark, Kate Crawford, Joseph Stiglitz and Stuart Russell.

As I listened, I realised the disregard for AI safety concerns among governments and the public rests on a handful of comforting myths about AI that are no longer true – if they ever were.

1: Artificial general intelligence isn’t just science fiction

The most severe concerns about AI – that it could pose a threat to human existence – typically involve so-called artificial general intelligence (AGI). In theory, AGI will be far more advanced than current systems.

AGI systems will be able to learn, evolve and modify their own capabilities. They will be able to undertake tasks beyond those for which they were originally designed, and eventually surpass human intelligence.

AGI does not exist yet, and it is not certain it will ever be developed. Critics often dismiss AGI as something that belongs only in science fiction movies. As a result, the most critical risks are not taken seriously by some and are seen as fanciful by others.

However, many experts believe we are close to achieving AGI. Developers have suggested that, for the first time, they know what technical tasks are required to achieve the goal.

AGI will not stay solely in sci-fi forever. It will eventually be with us, and likely sooner than we think.

2: We already need to worry about current AI technologies

Given the most severe risks are often discussed in relation to AGI, there is often a misplaced belief we do not need to worry too much about the risks associated with contemporary “narrow” AI.

However, current AI technologies are already causing significant harm to humans and society. This includes through obvious mechanisms such as fatal road and aviation crashes, warfare, cyber incidents, and even encouraging suicide.

AI systems have also caused harm in more oblique ways, such as election interference, the replacement of human work, biased decision-making, deepfakes, and disinformation and misinformation.

According to MIT’s AI Incident Tracker, the harms caused by current AI technologies are on the rise. There is a critical need to manage current AI technologies as well as those that might appear in future.

3: Contemporary AI technologies are ‘smarter’ than we think

A third myth is that current AI technologies are not actually that clever and hence are easy to control. This myth is most often seen when discussing the large language models (LLMs) behind chatbots such as ChatGPT, Claude and Gemini.

There is plenty of debate about exactly how to define intelligence and whether AI technologies truly are intelligent, but for practical purposes these are distracting side issues. It is enough that AI systems behave in unexpected ways and create unforeseen risks.

Screenshot of a chat in which a chatbot appears to attempt to copy itself to a new computer when faced with the prospect of being shut down. — Several AI chatbots appear to display surprising behaviours, such as attempts at ‘scheming’ to ensure their own preservation. Apollo Research

For example, existing AI technologies have been found to engage in behaviours that most people would not expect from non-intelligent entities. These include deceit, collusion, hacking, and even acting to ensure their own preservation.

Whether these behaviours are evidence of intelligence is a moot point. The behaviours may cause harm to humans either way.

What matters is that we have the controls in place to prevent harmful behaviour. The idea that “AI is dumb” isn’t helping anyone.

4: Regulation alone is not enough

Many people concerned about AI safety have advocated for AI safety regulations.

Last year the European Union’s AI Act, representing the world’s first AI law, was widely praised. It built on already established AI safety principles to provide guidance around AI safety and risk.

While regulation is crucial, it is not all that’s required to ensure AI is safe and beneficial. Regulation is only part of a complex network of controls required to keep AI safe.

These controls will also include codes of practice, standards, research, education and training, performance measurement and evaluation, procedures, security and privacy controls, incident reporting and learning systems, and more. The EU AI act is a step in the right direction, but a huge amount of work is still required to develop the appropriate mechanisms required to ensure it works.

5: It’s not just about the AI

The fifth and perhaps most entrenched myth centres around the idea that AI technologies themselves create risk.

AI technologies form one component of a broader “sociotechnical” system. There are many other essential components: humans, other technologies, data, artefacts, organisations, procedures and so on.

Safety depends on the behaviour of all these components and their interactions. This “systems thinking” philosophy demands a different approach to AI safety.

Instead of controlling the behaviour of individual components of the system, we need to manage interactions and emergent properties.

With AI agents on the rise – AI systems with more autonomy and the ability to carry out more tasks – the interactions between different AI technologies will become increasingly important.

At present, there has been little work examining these interactions and the risks that could arise in the broader sociotechnical system in which AI technologies are deployed. AI safety controls are required for all interactions within the system, not just the AI technologies themselves.

AI safety is arguably one of the most important challenges our societies face. To get anywhere in addressing it, we will need a shared understanding of what the risks really are.

Paul Salmon, Professor of Human Factors, University of the Sunshine Coast

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Saturday, 28 December 2024

Artificial Intelligence: it's all in the coding or is it ?

Sample 1 - AI created avatar - female

Sample 2 -AI created avatar - male

Considerable public debate has ensued on the positives and negatives (aka benefits versus risks) of Artificial Intelligence [AI]. The capability of the new systems cannot be fully quantified at this time however there are simple tests to demonstrate how such systems can interpret commands provided to them by users. On even a very simple, superficial level, AI can produce dramatically different results for the same question or request by a user.

The test above is one such example. A well known AI programme was tasked by a user to create an avatar image to assist the user to interact with the AI programme. The request was very basic without any significant details as to what image should be developed other than one should be female and one should be male with a positive expression. The AI program produced sample 1 for the female image and sample 2 for the male image. Is there an bias inherent within the AI programme ? That is hard to prove however the two images created could not be more different. Sample 1 is a very humanistic female image with a pleasant interactive professional expression. Sample 2 is an abstract almost idiotic impression of a non gender entity yet the instructions for each sample, female and male were identical and made at the same time.

Sunday, 19 May 2024

Artificial Intelligence - the potential threat is wildly underestimated

Shutterstock

As artificial intelligence (AI) is now being rolled out across multiple platforms and usages, the warning from key people in the industry should take on a stronger emphasis. In May 2023, hundreds of industry leaders from Open AI, GoogleDeepMind, Anthropic and other key technology companies issued a stark warning on the risks of AI and the need for a pause in AI deployment, new laws and Government regulatory oversight.

AI applications at present already cover a range of industries and functions including:

smart assistants
automated self driving vehicles
virtual travel assistants
marketing chatbots
manufacturing robots
healthcare management
automated financial investing

The rollout has continued apace despite the warning from 2023.

The potential threat of artificial intelligence, machine learning and self awareness in previous years has largely been portrayed in the realm of science fiction books and notably in films such as The Terminator (1984) by James Cameron and the various versions that followed in that film franchise. In the 1970s, another film that addressed the issue was Colossus: The Forbin Project which concluded with dire consequences for most of the human race. Another sci fi film, Demon Seed (1977) actually includes the use of fake imagery using AI to fool human beings and the eventual transcending of computer intelligence into human form. The use of robots that eventually transform into self aware life is covered in a variety of forms from the foundational story of the television series Battlestar Galactica through to feature film, Bladerunner (1982) by Ridley Scott which was based on Philip K Dick's novel When Androids Dream of Sheep. A theme throughout is the underestimation by human beings of the impact and potential risk in relation to what they have created.

AI is a powerful tool but it is one where the human control element can be reduced to almost tokenistic. The risks are very real of -

misuse of AI by criminal groups (which already occurs in part)
using AI as a weapon by state actors for the furtherance of strategic or tactical gain
machine learning growing exponentially such that the programmers and code writers no longer are able to understand what is being produced
machine to machine learning whereby the links between software produces inherent errors or unforseen negative effects
high speed decision making by AI which are difficult for humans to prevent or rectify.

The images below are AI generated by the publisher of this blog and serve as simple examples of the most basic form of AI:

Shutterstock - AI generated

The first example, is a Major-General in the British Army circa early 20th century being a historical photograph. The second example is printed below and serves as a complete contrast, being a young woman in evening dress with a shawl. There are technical errors contained in these images however much more sophisticated versions are possible from more complex AI systems. It is easy to see how deep image fakes of known public figures can be produced with minimal effort.

Shutterstock - AI generated