What will it take to trust autonomous systems?

What does trust look like when it comes autonomous systems that use AI elements? How do we design and test for it?

At the turn of 20th century, American inventor Elmer Sperry developed the gyrocompass, which uses the rotation of the Earth together with a rotating wheel to point towards the true North determined by Earth rotation axis. This device offered a great advantage over the magnetic compass for ship navigation since the performance of the gyrocompass is not affected by the metal objects in the surroundings of the device; something that could hardly be avoided on ships at the time.

A few years later, in about 1930s, Sperry also developed the first gyro-pilot, a device which used the gyro-compass together with a steering mechanism to implement an automated course-keeping function. This was adopted by the US Navy, and the crew called the gyro-pilot “Metal Mike” since it emulated the commands of a helmsman. This was the inception of the automation age in maritime platforms, which was then followed by the development of several other pieces of automation.

Today, marine craft incorporates sophisticated automation that enables safe, efficient, and sustainable operations. However, as we are entering the age of autonomous systems questions arise as to what makes autonomy and most importantly how we can trust this new wave of advanced automation. In this article, we explore the function of a new “Digital Mike” as the embodiment of autonomy, how it relates to its old uncle Metal Mike, and how can we trust it.

Why Digital Mike?
Today’s marine craft conducting operations and defence can incorporate these highly-sophisticated pieces of automation. For example, ferries not only incorporate course-keeping autopilots and speed control systems, but also ride control systems that aim to reduce the motion induced by the waves to maintain crew efficiency and passenger comfort. Naval platforms may additionally incorporate systems that allow them to come into and maintain formation and to develop motion patters that enable the efficient use of particular sensors and weapons.

If we move into offshore operations in oil and gas, we also find dynamic positioning systems that use the propulsion units for stationkeeping - maintaining position and heading and reduce the loads of mooring lines of offshore rigs. As sustainability regulations for the protection of the environment become widespread across nations, power management and emissions control systems are becoming common onboard ships. Underwater platforms, from unmanned underwater vehicles to submarines, have specialised guidance, navigation, and motion control systems that implement motion behaviours needed for their missions and operations. Automation has become a key enabling factor in modern maritime operations.

One cannot help questioning why we need further levels of automation. First, to remove humans from situations that are dangerous. Second, to augment personnel’s power through the collaborative use of the next generation of technologies. Third, to ensure the safety and sustainability of operations as they continue to increase in complexity.

What should Digital Mike do?
First we need to distinguish autonomy from the current automation already present. As a starting point for making such distinction, we can consider the traditional role of humans onboard marine craft. The key roles of humans on board of ships are to operate, navigate, and communicate. To this list, we must also add that these roles are to be maintained in the presence of anomalous conditions ie contingency management.

Then, we have a parallel in functions to that of humans a basis for describing autonomy: while Metal Mike emulated the commands of a helmsman, Digital Mike’s functions are to operate, navigate, communicate, and manage contingencies. These functions, which are built upon the underlying automation, can be implemented in a variety of ways, under a variety of mechanisms for intervention, and using different technologies including AI.

What is the essence of autonomy?
Unfortunately, there is no agreed-upon definition of autonomy nor taxonomies about levels of autonomy. Such taxonomies, while making some people comfortable, may be unnecessary. Engineers design reliable safety-critical systems in in many sectors, and yet we do not talk about taxonomies of reliable systems nor specify levels of reliability. What matters are the appropriate behaviour requirements of the autonomous system for the operations and missions that the system will conduct.

When we talk about autonomy, we often refer to the interrelation of three concepts: the nature of the interventional environment, the complexity of the operational environment, and technology of autonomous agents that can perceive, think and act.

Let’s unpack this; going back to a definition of a system: “a system is a delimited collection of interacting entities”. In this concise definition, every word highlights an important attribute. First, we must have more than one entity, and these entities interact with one another. The interactions can be through exchange of energy as it is the case of physical systems, or through the exchange of information.

The word “delimited” is fundamental to understand autonomy. By delimited, we mean that there is a separation, a boundary, between what is considered to be part of a system and what is not. What it is excluded is called the system’s environment – the rest of the universe.

This delimitation can be clearly defined, like for example when we consider the economy of Australia as a system, in which case there is clear geographical boundary that separates the entities that form the Australian economy from those in the rest of the world. In other examples, this delimitation may be conceptual; something useful for a description or analysis. This can be the case when we analyse the thermodynamic efficiency of a ship’s diesel-generator and separate it from the rest of the atmosphere.

The delimitation between system and environment is key for autonomy, and to see this we further need to partition the environment into two: operational and interventional. The operational environment one refers to the natural interactions between the system and its surroundings; for example, in the case of autonomous surface ship, the operational environment can encompass weather, sea state, depth, visibility, obstacles, sensor and actuators heath, cyberattacks by other entities, etc. As the autonomous system interacts with the operational environment, certain system behaviours are exhibited.

The interventional environment, on the other hand, refers to the potential for humans and other autonomous systems to exert interventions that can change the system behaviours.
Then, when we talk about autonomy, we talk about

The nature of the interventional environment;
The complexity of the operational environment;
The autonomous agent capability (sensing, thinking, and acting).

Any autonomous system can be described by the interrelation of these three components. Interventions can range from launch and forget to human intervention in anomalous conditions (human on the loop) to Human-remotely-operated machines (human in the loop). The nature of the interventional environment defines the degree of interaction through which we can effect significant changes to the system behaviours.

Managing complexity
The complexity of an operational environment can change with the weather, sharing the operational space with other autonomous systems or with human-operated machines, and in the case of military missions the potential of engaging hostile forces. The operational environment also affects the behaviours of the system if the system is allowed to learn and adapt to changes in its operational environment. However, these behavioural changes are different from those that the interventional environment can trigger.

Developers should take into account and identify overlaps with the interventional environment since the operational environments could be manipulated as part of cyberattacks to control the behaviour on the autonomous system. One example of such cyberattacks is GPS spoofing.

The technologies used to implement autonomous agents (that can sense, think and act) can vary from standard normative decision tools to AI. Standard normative decision tools include what developers call explicit programming, whereby developers can anticipate interactions with the operational environment and their uncertainties and program the actions to be taken by the agent.

AI technologies consider flexible algorithms that must be specialised to particular problems through a process called training or learning using data. The data can be from past operations or from virtual operations.

The inter-relation of three attributes of autonomy (nature of the interventional environment, complexity of the operational environment, autonomous agent capability) is seldom mentioned explicitly in the literature of autonomous systems. When discussing autonomy, scientist and engineers often focus on where in the system a particular piece of technology is being deployed.

For example, industry and researches may talk about the use of a new supervised machine-learning algorithm to implement the perception function of the system by which it gains awareness to its operational environment. They may also talk about the use natural language processing technology for the system to communicate with humans, the use of unsupervised machine learning to detect anomalies and faults from sensor data, and the use of reinforcement learning to learn a policy for decision making when decisions must be taken in a sequence. Digital Mike is not a single box that can be strapped on a vessel with some level of automation.

Philosophers have long considered models of how people develop trust with one another. These models can guide us on how to develop autonomy that could be trusted. One of such models considers trust as the combination of competency and integrity. Integrity carries a much higher weight than competency. That is, it is often easier to maintain or recover trust from a breach of competency. A breach of integrity, however, is usually much harder to reconcile.

This model is very interesting in the context of autonomous systems. Integrity could be embedded in the system as part of the system design, while competency can be acquired through learning or re-designing and then be tested.

In discussions with Australian Defence personnel, time and again, the fundamental factor for trust that came first is integrity and honesty in particular: it is important that an autonomous system can do what it communicates it can in a particular operational condition. A second aspect that figures highly is that personnel and autonomy must train together before deployment. This aligns with humans assessing the degree of competency about autonomous system technology.

How can the competency of an autonomous system be assessed?

One attractive way of assessing the competency of autonomy is through the assessment of behaviours and the characterisation of uncertainty about these behaviours. In practical terms, we define behaviours by considering performance indices that we can quantify and associating to them ranges of acceptable values. For example, we could require an autonomous surface vessel to manoeuvre and remain clear of other traffic and that it adheres to Collision Regulations at Sea (ColRegS).

Considering behaviours in this way has several advantages. If we describe the operation requirements of a system in terms of behaviours then we can use different technologies to implement these behaviours. Also by testing the behaviours we may not require to consider industry proprietary information for test and evaluation.

What is essential to the competency assessment is the behaviour and not how the behaviour is implemented. This provides a parallel on how we currently qualify the competency of humans. Indeed, when we assess a person for a licence to operate a machine, we do not analyse the signals in the brain. Instead, we pose scenarios and we assess the person’s behaviour in these scenarios. Perhaps autonomy should be assessed in the same way.

Autonomy can be tested in a much more thorough way than humans can the system can be tested more comprehensively using a large number of scenarios repeatedly. Also, autonomy does not relax when being tested in a virtual environment, and therefore, virtual assessment can be more conclusive than it can be for humans.

How do we test trust?
A question that often arises is whether we can test every possible scenario that could arise from the interaction of the system and its environment. The answer is no. But we can test a lot, and we can test it more comprehensively that we test it on humans through the use of accelerated simulation techniques.

We anticipate operations requiring human-autonomy interaction. This poses one of the largest risks in deploying autonomous systems in the short-term future without segregating their operational environments. This segregation is common in automated production and fabrication where there are boundaries on the zones where robotic manipulators operate and where humans operate. These zones do not necessarily overlap.

Such segregation provides a means to ensure the safety of humans, but this limits the range of operations and the benefits envisaged to the technology, especially in the short term. Therefore, frameworks for assessing behaviours that can test autonomy, humans, and human-autonomy interaction become attractive.

We mentioned the example of an autonomous surface vessel that manoeuvres and remains clear of other traffic while adhering to ColRegS. Regulations and certain aspects of the Law, such as negligence, are often formulated in terms of behaviours. Many nations and bodies such as the UN are currently considering ethical frameworks for the development and deployment of autonomous systems and AI-related technologies. Ethical principles are formulated in terms of behaviours that are acceptable within particular societal contexts. Then from the perspective of qualifying competency of behaviours, we can test behaviours related to safety and performance and some related to law and ethics.

A framework
If we adopt a behavioural approach as a means for qualifying competency of autonomy, then we can consider the following steps for the assessment:

Define the type of operations and missions for the autonomous system;
Define relevant behaviours for each operation and mission;
Define the operational and interventional environment for assessment;
Asses the system over selected environmental conditions to quantify uncertainty;
Present information to stakeholders to inform their decision processes;
Support stakeholders with the decision process;

Steps 1 to 3 relate to system design specifications, and they may be augmented with specific needs for the purpose of the assessment. These steps can provide input to virtual testing environments based on computer simulations and to the planning of assessment through trials. In essence, these steps are akin to the process that Navies around the world, including the Royal Australian Navy, use to assess the behaviour of ships in waves -seakeeping analysis – to make decisions during tender processes.

But there are key differences. In steps 2 and 3, behaviours for an autonomous system must be considered in a wider sense including safety, performance, human-autonomy requirements, and even autonomous decision making. The operational environment must also incorporate anomalous conditions. Recall that one of Digital Mike’s functions is to manage contingencies, so the qualification must include how the autonomous system manages anomalous situations.

The latter can be internal to the system such as faults in sensors, force actuators, a drop of communication channels, or external such as changes in the operational environment that extend what is required in nominal operations and missions.

Virtual testing and the development of digital twins are set to become an important tool for test & evaluation in industry (see more on the Ship Zero concept on P36). The qualification of uncertainty associated with step 4 is fundamental for autonomy. No matter how much testing we can do on a system, we will always remain uncertain about particular aspects of its behaviour. There are different ways to conduct uncertainty quantification.

Addressing uncertainty
One way is to consider hypotheses about behaviours and use the data collected during the assessment step to compute probabilities about the truth or falsity of these hypotheses over the envelope of operational conditions dictated by the mission. The uncertainty must then be communicated to different stakeholders to inform their decision process. Regulators face decisions about certification, actuaries face decisions about insurance of operations; defence and other end users face decisions about acquisitions of systems for particular operations and missions; developers face design decisions to develop technologies that target particular operations.

All these stakeholders face decisions which must be made under uncertainty. Hence, step 4 provides key information to inform their decision process. Step 5 addresses the need to match the information to the decision context of different stakeholders - if the information is not provided in the correct way, it can hinder rather than helping the decision process. Making decisions about autonomy in the context of different stakeholders is not a simple task, and many stakeholders will need independent support. This provides an opportunity to develop a new component of the industry dedicated to the support.

At the Trusted Autonomous Systems Defence CRC, we are developing Australia’s sandbox for assurance of autonomy. We are bringing together stakeholders such as regulators, legislators, defence, researchers, and industry to test frameworks such as the six-step one describe above. We are seeking to exploit the unique opportunities that exist in Australia for the potential to lead in the deployment of autonomous systems can operate safely, routinely, and sustainably.

Developing trust in autonomy will require the adoption of models of trust and guidelines for the development of systems that incorporate integrity followed by new ways of conducting test & evaluation in order to establish competency. Autonomy, in this article embodied by Digital Mike, plays particular roles that parallel the ones currently played by humans; to operate, navigate, communicate while also managing contingencies. The way we assess the competency of humans today in relation to these roles through behaviours can help us to find ways to assess the competency of autonomy.

Note: Dr Tristan Perez is the Leader of Autonomy Assurance at the Trusted Autonomous Systems Defence CRC.

This article first appeared in the October 2019 edition of ADM.