Feb 25, 2024 Machine Identity

$25.5 Million in 12 Minutes: The Arup Deepfake Heist That Should Terrify Every CFO

An employee at engineering firm Arup transferred $25.5 million after a video call with deepfake recreations of senior executives. This wasn't a failure of awareness training. It was an architecture failure where visual identity was the only trust layer.

In late January 2024, a finance employee at Arup’s Hong Kong office received an email from the company’s chief financial officer requesting a confidential transaction. The employee was suspicious. The request was unusual. The amount was large. So the employee did exactly what security training teaches: they sought verification through a different channel.

They joined a video call with the CFO and several other colleagues.

The call went smoothly. The CFO was there, on camera, sounding like himself. Multiple other senior staff were present, visible, and participatory. The employee’s suspicion dissolved. Over the following days, the employee executed fifteen wire transfers totaling HKD 200 million, approximately $25.6 million, into five Hong Kong bank accounts.

Every person on that video call, except the employee, was an AI-generated deepfake.

The fraud was discovered only when the employee later contacted Arup’s actual headquarters to follow up. By then, the money was gone. As of early 2025, no arrests have been made and the funds remain unrecovered.

This is the incident that should end every argument about whether deepfake fraud is a real enterprise risk. It is. And the most terrifying part is not the technology. It’s that the victim did everything right.

The anatomy of a full-stack deepfake attack

The Arup attack wasn’t a crude face-swap overlaid on a recorded video. It was a multi-participant, real-time deepfake deployed in a live video conference, combined with sophisticated social engineering that exploited the victim’s own diligence.

Hong Kong police described the incident in a press briefing in early February 2024. Senior Superintendent Baron Chan Shun-ching told reporters that the fraudsters had created deepfake representations of multiple Arup staff members using publicly available video and audio footage: material sourced from prior conference presentations, company videos, and public appearances.

The attack’s design was precise. The initial contact was a phishing email mentioning a confidential transaction, which created the urgency and secrecy that prevented the employee from consulting widely. When the employee sought verification, the attackers were ready with a pre-arranged video call populated entirely by synthetic participants. The technology delivered what the social engineering required: a multi-sensory confirmation of identity that matched what the victim expected to see and hear.

Rob Greig, Arup’s Chief Information Officer, described the incident to the World Economic Forum as “technology-enhanced social engineering.” That framing is precise and important. The deepfake wasn’t the attack. It was the enabler of an attack that followed the oldest playbook in fraud: impersonate authority, create urgency, isolate the target, and exploit trust.

But the enabler changed the game. Before the deepfake era, executing this kind of multi-person impersonation in real time was impossible. Now it required only access to publicly available video of the targets and off-the-shelf AI tools.

Why the victim’s diligence is the scariest part

Security professionals will be tempted to analyze this case through the lens of what the employee did wrong. That instinct is misguided. The employee did the right thing: they were suspicious of an unsolicited email, and they sought verification through what was, until very recently, considered a reliable second channel.

The problem isn’t the employee. The problem is that the verification channel itself is compromised, and most organizations haven’t acknowledged this.

Consider the standard verification hierarchy that financial controls training teaches. Email is considered untrustworthy (phishing risk). Phone calls are considered somewhat trustworthy (caller ID can be spoofed, but voices are harder to fake). Video calls are considered highly trustworthy (seeing someone’s face in real time is treated as proof of identity). In-person meetings are considered the gold standard.

The Arup case demolished the third tier of that hierarchy. A video call with multiple known colleagues, the scenario most finance professionals would consider bulletproof, was entirely synthetic. The victim applied the security training they had received. The training was out of date.

This is where the institutional failure lies. Not with the employee. With every organization that still includes “confirm via video call” as an acceptable step in their financial authorization process. And that, as of February 2024, was most of them.

The Hong Kong context reveals a broader pattern

The Arup incident was the most expensive deepfake fraud case publicly disclosed, but Hong Kong authorities indicated it was far from isolated.

In the same period, Hong Kong police reported making six arrests connected to deepfake-enabled fraud schemes. Eight stolen identity cards had been used to create synthetic identities for 90 loan applications and 54 bank account openings. On at least 20 separate occasions, AI-generated deepfakes had been used to defeat facial recognition systems during identity verification processes.

That last detail is worth pausing on. Facial recognition, the biometric authentication method that banks, immigration authorities, and identity verification platforms have spent billions deploying, was being defeated by deepfakes at the consumer level. Not by sophisticated nation-state actors with custom tools. By fraud rings using commercially available technology.

The broader financial fraud landscape was accelerating in the same direction. Deloitte’s Center for Financial Services projected that AI-enabled fraud losses would reach $40 billion annually by 2027. Sumsub’s fraud data showed a tenfold increase in deepfake fraud attempts in 2023 alone, with the trend continuing to accelerate through 2024.

The Arup case wasn’t an anomaly. It was a signal of what enterprise fraud looks like when the cost of impersonation drops to near zero.

The CFO’s nightmare scenario is now the baseline

Before the Arup incident, the deepfake risk discussion in most boardrooms was abstract. “Deepfakes exist. They could theoretically be used for fraud. We should be aware.” The Arup case made it concrete: a specific company, a specific amount, a specific failure mode that any organization could replicate.

The attack exploited several assumptions that are embedded in how most enterprises operate.

Assumption one: a live video call is proof of identity. It isn’t. A live video call is proof that pixels are rendering on your screen. Those pixels may or may not correspond to a real person.

Assumption two: multiple participants increase trustworthiness. The Arup attackers understood this intuitively. A one-on-one deepfake call might have maintained the employee’s suspicion. A multi-person call with familiar faces overwhelmed it. The cognitive load of questioning whether every person in a group meeting is real is something most humans simply won’t sustain.

Assumption three: the attacker needs inside information. They don’t. They need publicly available video and audio of their targets. Conference talks. Earnings calls. Company promotional videos. LinkedIn posts. Podcast interviews. All of this is material that the targets themselves have distributed publicly.

Assumption four: this is a technology problem that technology will solve. It isn’t. Deepfake detection tools exist, but they’re locked in an arms race with generation tools, and they’re not deployed on Zoom, Teams, or WebEx calls in any meaningful way. Even if they were, the detection rates aren’t high enough to replace process controls. Detection is a complement to good process, not a substitute.

The practitioner’s uncomfortable question

I build enterprise AI systems. I understand intimately how quickly these capabilities improve. And when I look at the Arup case through the lens of system design, the question that troubles me isn’t “how did this happen?”, it’s “why don’t we assume this is happening constantly?”

In every other domain of security, we assume the channel is compromised. We encrypt data because we assume the network is hostile. We use MFA because we assume passwords are compromised. We implement zero trust because we assume the perimeter is breached.

But for some reason, we exempt the most bandwidth-rich verification channel, live video, from this assumption. We treat it as inherently trustworthy. The Arup case proved it isn’t, and the technology gap between offensive deepfakes and defensive detection is widening, not closing.

The organizations I advise through my work with the Coalition for Secure AI are starting to internalize this. The ones that are farthest ahead have stopped asking “how do we detect deepfakes?” and started asking “how do we design processes that don’t rely on visual or audio identity as a single point of trust?” That’s the right question. But it took a $25.6 million lesson for most of them to start asking it.

The process redesign that should happen immediately

The corrective actions from the Arup case are specific and implementable without purchasing any new technology.

Implement mandatory out-of-band verification for all financial transactions above a defined threshold. “Out-of-band” means a completely separate communication channel from the one where the request originated. If the request came via email and was “confirmed” on a video call, the authorization must be completed through a third channel: a phone call to a pre-registered mobile number, an in-person confirmation, or a message through an internal system with separate authentication. The key is that the verification channel must be pre-established and known only to the parties involved, not something the attacker can anticipate or create on the fly.

Create a challenge-response protocol using pre-shared secrets. Before any high-value transaction can be authorized, the requesting party must provide a code word or phrase that was established in a prior, verified interaction. This defeats deepfakes entirely: the synthetic CFO can replicate the real CFO’s face and voice, but cannot produce a secret that was never spoken or written in any accessible medium.

Remove “video call authorization” from your formal financial controls. If your written procedures include any variant of “verify via video conference” as an acceptable control, rewrite them. This doesn’t mean you can’t use video calls, it means they cannot serve as a standalone verification step for any action with material financial consequence.

Conduct deepfake simulation exercises with your finance team. Not a presentation about deepfakes. A live exercise in which a realistic synthetic request is inserted into their workflow, and their response is evaluated against updated procedures. The Arup employee’s instinct to verify was correct. The problem was that the available verification mechanism was compromised. Your team needs to practice operating in a world where that’s true.

Brief your C-suite personally on the Arup case and its implications for your organization. Not in a written report, in a conversation where you can observe whether they understand that the threat isn’t hypothetical. If your CFO’s response is “our people would catch that,” ask them what specific mechanism would catch it that the Arup employee lacked.

The question every board should be asking

The Arup deepfake heist is not a story about AI. It’s a story about trust assumptions that haven’t been updated for a world where identity can be synthesized at scale.

Every enterprise has processes that rely on “I verified it was them” as a control. Treasury operations. Executive authorizations. M&A negotiations. Legal communications. HR decisions. Wherever visual or verbal identity confirmation serves as a gatekeeper, the organization is vulnerable to the same attack that cost Arup $25.6 million.

The board question isn’t “could this happen to us?” The question is “which of our processes would fail the same way, and how quickly can we fix them?”

Because the attackers who targeted Arup didn’t need to breach a network, exploit a vulnerability, or compromise a credential. They needed video footage that the company’s own executives had posted publicly, and AI tools that anyone can access. The barrier to entry is gone. The only remaining defense is process design that assumes the channel is compromised.

That assumption should have been universal by the end of February 2024. For most organizations, it still isn’t.

The anatomy of a full-stack deepfake attack

Why the victim’s diligence is the scariest part

The Hong Kong context reveals a broader pattern

The CFO’s nightmare scenario is now the baseline

The practitioner’s uncomfortable question

The process redesign that should happen immediately

The question every board should be asking

More like this

An AI Agent Just Pwned Trivy, Microsoft, and DataDog in One Week

88% of You Have Already Had an AI Agent Security Incident. The Other 12% Probably Don’t Know Yet.

The Trust Boundary Problem: Identity Architecture for Autonomous AI

AI Security Fundamentals: An Architectural Playbook

Palo Alto’s Unit 42 Just Found a Way to Hijack AI Agent Conversations - And Your Users Can’t See It Happening

MCP Just Got a New Home at the Linux Foundation - But Its Security Debt Followed It There