The Task Is Not Impossible: Fail Securely

As systems become more complex, they are also likely to fail in ways of which the exact mechanism might be harder to predict and understand beforehand. With machine-learning systems, there is a bigger problem than failure, which is exhibiting a potentially dangerous behavior as and when failure occurs, as part of the system/sub-system failure. This concern is understandably further magnified in the context of lethal autonomous systems.

Exploitation surface of a generic ML pipeline

Given the increasingly diverse exploitation routes, the prevailing ideas suggest to have a well trained human operator who is periodically assessing whether an AI is misinterpreting its environment. But conventional human-in-the-loop mechanisms are ill-suited to handle spatiotemporal complexity. Consider for example a large number of potential targets in a tough or no geography and an extremely compressed timeline for making engagement. In this situation it is simply not feasible for the AI to refer back to the human operator every time an engagement has to be made.

If adapted suitably, the old concept of kill-boxes may produce a simple socio-technical solution to the problem of how to detach from conventional human-in-the-loop and embrace independent decision-making in military AIs while still keeping the operator to monitor incoming data. It is also in line with the existing cyber-security principles of network segmentation and granting access based on user's role/location/time etc.

By not wholly depending upon autonomous systems' ability to interpret context and limiting its "full-fledged use" within a human generated spatiotemporal compartment i.e. a kill-box, we would not only impart the AI operation a human-like non-zero probability of making high risk "alpha zero" moves, but also allow more secure failures that cannot exacerbate the larger conflict while providing all the benefits of deploying advanced autonomous technology. This is especially valid for the global-common type environments like the space and the ocean, and of course the internet too; where there is further need to research and manage AI security risks as most nations in such environments are virtually in a persistent struggle with their allies and adversaries simultaneously.

A typical conventional kill-box. (image source: WikiLeaks)

Ideally, a military should develop its own autonomous systems instead of relying on commercial off-the-shelf products or even allies' systems for those may come with their own inductive biases and are sometimes less likely to fully support complex missions. There are obvious economic, organizational and foreign policy incentives for doing this. And most importantly it would let the machine behavior policy and AI failures be much more clearly defined and adversarially trained against in a manner that does away with insecure failures while also suiting the respective country's cultural sensibilities.

Speaking of latter, complex societies require a fair amount of organized coercion, socioeconomic incentives, cultural deterrence and mutation over a course of centuries to become eligible for the "civilization badge". So it should be natural to want your AIs to reflect those civilizational ethos. Understandably culture isn't an engineer's problem but the way technology (particularly ICT) accords vastly different modes of social interactions and restructures social and even political affairs, it sets the premise for engineering which can function as 'politics by other means'. Perhaps Kaczynski was right. And that's all the more reason to develop systems that embrace failures, and fail securely.

The Task Is Not Impossible

Saturday, September 5, 2020

Fail Securely

No comments:

Post a Comment

Get new posts by email: