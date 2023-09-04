Anyone who has kept tabs on the latest AI offerings will know these systems are prone to“hallucinating” (making things up) – a flaw that's inherent in them due to how they work.

Yet Hinton highlights the potential for manipulation as a particularly major concern. This raises the question: Can AI systems deceive humans?

We argue a range of systems have already learned to do this – and the risks range from fraud and election tampering, tolosing control over AI.

Perhaps the most disturbing example of a deceptive AI is found in Meta's CICERO , an AI model designed to play the alliance-building world conquest game Diplomacy.

Meta claims it built CICERO to be“largely honest and helpful”, and CICERO would“never intentionally backstab” and attack allies.

To investigate these rosy claims, we looked carefully at Meta's own game data from the CICERO experiment. On close inspection, Meta's AI turned out to be a master of deception.

In one example, CICERO engaged in premeditated deception. Playing as France, the AI reached out to Germany (a human player) with a plan to trick England (another human player) into leaving itself open to invasion.

After conspiring with Germany to invade the North Sea, CICERO told England it would defend England if anyone invaded the North Sea. Once England was convinced that France/CICERO was protecting the North Sea, CICERO reported to Germany it was ready to attack.

Playing as France, CICERO plans with Germany to deceive England. Park, Goldstein et al., 2023