Can AI Teach Itself to Abuse IHL-Protected Indicators?

Can AI Teach Itself to Abuse IHL-Protected Indicators?


In war, there is every incentive to deceive. Thousands of years ago, humans learnt that misleading the enemy, dissimulating one’s intentions, and misrepresenting the truth often led to tangible tactical and operational benefits.

Humans, however, also came to understand that unfettered use of such tactics would, in the long term, lead to a worse outcome for humanity as a whole, by making every entity a potential defector. As such, belligerents came to agree that certain symbols needed a special degree of protection from humanity’s worst impulses. This rationale lies behind international humanitarian law (IHL)’s prohibition on the misuse of certain indicators, such as the International Committee of the Red Cross (ICRC)’s distinctive emblems. Abusing these symbols is particularly corrosive to the protective framework provided by IHL, as it undermines trust in the entire legal system.

Humans, crafty but pragmatic, made a long-term calculation: even if deception is immediately useful, unlimited treachery would be undesirable overall. But what if artificial intelligence (AI), now increasingly able to reason and creatively solve problems, follows humanity’s reasoning only half-way? Could AI, seeking only short-term benefits, choose the path of unrestricted treachery instead?

For instance, could AI independently learn that abusing IHL’s protective indicators is its most optimal course-of-action?

Cybersecurity: An (Un)Problematic AI Use-Case?

Maintaining freedom of action in cyberspace has become a critical priority for modern militaries. With military systems and command-and-control becoming increasingly interconnected, cyberattacks can be just as devastating as conventional ones. At the same time, militaries have come to recognise the force-multiplying potential of AI in cyber-offense. AI-controlled attacks could be “fast, flexible and adaptive and simply overwhelm the current generation of defences.”

In response, stakeholders are investing significantly in AI-based cybersecurity systems to allow, among other things, automated threat detection, risk analysis, and response. Given the current shortage of human cyber defenders, an increasing share of their responsibilities will be entrusted to AI. Autonomous Cyber Defence Agents (CDA), that detect cyberattacks and decide on behalf of their human counterparts which defensive actions to take, could, despite many challenges, be a necessary solution. In particular, some consider reinforcement learning—when the agent learns from its environment through trial-and-error to reach its defined goals—as superior to humans in complex environments such as cyberspace, where speed and scale of action can be critical.

Do CDAs, even if highly autonomous, pose a threat to IHL compliance? As long as they remain purely defensive systems, they seem to be minimally problematic. Unless cybersecurity systems are also allowed to engage in aggressive actions such as hackbacks or counter-code injection (in which case obligations related to cyber-offense would also apply), their functions would focus purely on securing, protecting, and restoring their network, which are seemingly innocent activities.

Despite the absence of works focusing on this issue, using AI for cybersecurity appears relatively uncontroversial. Of the limited commentary on the relationship between IHL and the defensive use of cyber, common topics raised include how States should organise cyberspace to minimise spillover to civilian networks and the evident interdiction on storing military data on a server marked with the ICRC emblem. To the authors’ knowledge, no one has raised objections to AI in purely defensive cybersecurity contexts. This stands in stark contrast to significant controversies concerning other military AI use-cases, such as autonomous weapons, AI-based targeting advisers, or autonomous malware.

This post argues that there is a unique risk that can potentially emerge from the use of sophisticated CDAs. To wit: the spoofing, without the operator’s awareness, of IHL-protected entities.

“Don’t Attack Me. I’m ICRC.” 

Is it feasible for a CDA to independently reason and conclude that pretending to be an IHL-protected entity—say, the ICRC—is beneficial? And would it be capable of effectuating this deception? The answer to both of these questions is “yes.”

AI is becoming increasingly capable of problem-solving, reasoning, and goal-oriented behaviour. Reinforcement learning incentivises out-of-the-box thinking and creative optimisations by setting a goal for the agent (e.g., “minimise the amount of successful cyberattack attempts”), and allowing the AI to freely explore the best means to achieve this goal. Indeed, the ability to adapt to new patterns, learn from past attacks, and dynamically respond to cyber-threats is seen as an important countermeasure against the increasingly adaptive AI-cyberattack capabilities used by adversaries.

Concurrently, sensing and learning allow the agent to gather information from the environment and gain an understanding of it. This environment might include the internet or databases from which the agent can learn world state patterns, compare them to current world state descriptors, and, based on this analysis, decide which action to take. Much of this information could lead the agent to “discover” the spoofing trick on its own. Historical data, for example, might reveal that systems affiliated with the ICRC are statistically attacked less, or the agent might read the many internet sources emphasising how the ICRC’s status provides “protection.” From this, it is possible for an agent to determine that it can instrumentalise this status to deter attacks.

Many agents will also have the ability to carry out this scheme. An action commonly available to CDAs is to deploy cyber camouflage; that is, techniques to manipulate information in the environment to make it more difficult for the cyber attacker to identify its target. One such technique is called deception, which makes use of decoys to impart false beliefs or perceptions on adversaries. On the system layer, agents may do this by creating a fake honeypot system that resembles another entity’s systems over the network. This is particularly useful in botnet distributed denial-of-service attacks. CDAs can be very cunning in deploying deceptions. Goal-oriented agents can, for example, creatively deploy decoys to delay and confuse cyber attackers, without their designers necessarily having “programmed in” these deceptive tactics.

For a CDA with such capabilities, it is mathematically straightforward: if painting the assets under its defence as an ICRC asset will minimise the number of cyberattacks from its law-abiding opponent, it will take this course of action.

Old Rules, Novel Legal Questions

Until recently, literature on cyber-deception focused mainly on the rules related to perfidy and misuse of indicators from an offensive perspective (see Additional Protocol (AP) I, arts. 37–39). Even in technical literature, cyber-deception is usually only framed as legally precarious when used for cyberattacks. This is unsurprising, as perfidy is rarely relevant in a defensive posture (as it requires harm to be inflicted on the adversary), and applying the misuse provisions is relatively straightforward (basically, “don’t disguise your military network as ICRC or a hospital”).

Given the prospect of CDAs independently discovering that spoofing particular protected identities in cyberspace could be advantageous in the short term, applying the provisions might not be as straightforward anymore. As demonstrated, CDAs could deploy these abusive tactics in novel and creative ways without any instruction or foresight by their operators, possibly even against other AI-based cyber-agents. As such, this scenario confronts us with legal questions surrounding the provisions on misuse of indicators, which we are now forced to reanalyse in a new light.

First, does it matter that the cyber operators did not intend or foresee the misuse? This is a novel legal question to ponder, as older forms of misuse of indicators were impossible without human knowledge and at least tacit approval. Interestingly, from an initial assessment of the law, it appears that this lack of foreseeability is a legally immaterial factor. Unlike a perfidious attack, which requires an intent to deceive, the prohibition in IHL against misusing protective indicators is absolute and requires no result. As noted in the Commentaries on the Two Additional Protocols by Bothe et al. (p. 241), “any improper use of these symbols in armed conflict, whether or not deception was consummated, constitutes a breach of art. 38” (emphasis added).

The legal assessment may be more complicated if the CDA decides to disguise itself as a civilian or enemy network instead. For the civilian spoofing scenario, the tactic seems to be permissible if used in a purely defensive posture (being more analogous to camouflage), since feigning civilian status is prohibited mainly if used to kill, injure, or capture (AP I, art. 37(1)(c)).

As to the enemy spoofing scenario, diverging interpretations exist. In the narrow view, using enemy indicators is prohibited only when used in support of an attack while, in the broader view, the interdiction extends to situations where they are used to “shield, favour, protect, or impede military operations.” A CDA’s misuse of enemy indicators would therefore fall under the prohibition in the broader view, but not under the narrow approach.

Second, what forms of AI-based identity spoofing actions would fall under the misuse prohibition? In the scenario above, the system decided to deceive attackers by creating a fake website (presumably one displaying the ICRC emblem), but a CDA could falsely represent its identity in many ways. These include decoy application layer deception technologies, which might unintentionally gather sensitive data of bona fide internet users.

The legal challenge mainly arises from the fact that emblems and signs were originally conceived as tangible objects. In contrast, what counts as an “indicator” in cyberspace has been subject to debate. For example, does a domain, public network gateway, Internet Protocol (IP) address, or metadata count? Depending on the legal position, a casuistic analysis of the CDA’s specific spoofing tactic may be necessary to determine whether it would fall under the misuse prohibition.

Finally, does it matter who is deceived by the spoofing attempt? With adversaries also likely to adopt AI for cyber-offense, the peculiar scenario can emerge where a CDA’s spoofing attempt deceives not a human analyst, but the adversary’s AI cyber-offense capability. Does this have legal implications? Perhaps not; conceiving the misuse provisions as absolute prohibitions, as discussed above, leads us to conclude that whether the deceit affects a human or AI device is legally immaterial, as the mere fact of misusing indicators isthe violation. This presents an interesting contrast to discussions around deceiving AI in perfidious attack situations, where some argue the wording “inviting the confidence of an adversary” would only cover cases where the deceit affects humans (AP I, art. 37(1)).

Reining in AI-Induced Treachery

The prospect of AI independently discovering and embracing humanity’s worst impulses—and undermining trust in IHL overall—may be alarming. Any solution to this particular concern should engage both the CDA deployers and the protected entities.

On the one hand, those using and developing CDAs should make an active effort to avoid misuse of indicators as described in this post. While reinforcement learning is powerful, it is now clear that unleashing the agent and giving it too much freedom could have unintended and deleterious effects. One solution is for a human to train the agent through additional supervised learning, which can, for example, put limitations on what domains, signs, and IPs the agents cannot spoof. This preserves the ability of CDAs to creatively explore defensive measures to thwart cyber attackers (the main purported advantage of these systems), while ensuring that human negotiation directs permitted courses of action beforehand. On the other hand, combining reinforcement learning with supervised learning represents several difficult technical challenges, such as overly large action and observation spaces.

Concurrently, entities protected under Article 38(1) of AP I (like medical services and cultural property) should consider adopting secured digital representations of their identifiers. The ICRC has taken the initiative in this respect through its Authenticated Digital Emblem (ADEM), which recently underwent testing at an exercise by NATO’s Cooperative Cyber Defence Centre of Excellence. CDAs are extremely unlikely to spoof such certificate-based digital signs, thereby ensuring trust in their authenticity.

Ultimately, the biggest challenge in this problématique was taking the epistemic first step—that is, realising that CDAs might take this action—in a field that the law had presumed to be relatively uncontroversial. Having bridged this gap, practitioners should now mitigate the risk by making stakeholders aware that this spoofing tactic is possible and by explaining how they can prevent it. A concerted effort from all actors can maintain the legitimacy of these indicators, a crucial element to maintaining IHL’s protective force in cyberspace.

***

Dr Jonathan Kwik is Researcher in international law at the Asser Institute in The Hague, Netherlands, specialised in international humanitarian law and militarised emerging technologies.

Adriaan Wiese is a defence-tech professional and the Managing Director of Mercury Autonomous Systems.

The views expressed are those of the authors, and do not necessarily reflect the official position of the United States Military Academy, Department of the Army, or Department of Defense. 

Articles of War is a forum for professionals to share opinions and cultivate ideas. Articles of War does not screen articles to fit a particular editorial agenda, nor endorse or advocate material that is published. Authorship does not indicate affiliation with Articles of War, the Lieber Institute, or the United States Military Academy West Point.

 

 

 

 

 

 

 

 

Photo credit: Getty Images via Unsplash



Source link