D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial Attack
Abstract
The advancements in generative AI have enabled the improvement of audio synthesis models, including text-to-speech and voice conversion. This raises concerns about its potential misuse in social manipulation and political interference, as synthetic speech has become indistinguishable from natural human speech. Several speech-generation programs are utilized for malicious purposes, especially impersonating individuals through phone calls. Therefore, detecting fake audio is crucial to maintain social security and safeguard the integrity of information.
Recent research has proposed a D-CAPTCHA system based on the challenge-response protocol to differentiate fake phone calls from real ones. In this work, we study the resilience of this system and introduce a more robust version, D-CAPTCHA++, to defend against fake calls. Specifically, we first expose the vulnerability of the D-CAPTCHA system under transferable imperceptible adversarial attack. Secondly, we mitigate such vulnerability by improving the robustness of the system by using adversarial training in D-CAPTCHA deepfake detectors and task classifiers.
Problem & Motivation
The D-CAPTCHA system is a defense against deepfake calls through a challenge-response protocol. It integrates five modules to verify the caller's authenticity:
The D-CAPTCHA System
Human-based: Assigns a random challenge to suspicious callers. Time: Constrains the response within 1 second. Realism: Deepfake detectors verify whether the response is spoofed. Task: ML classifiers verify the response contains the requested content. Identity: Evaluates speaker similarity between initial and response audio.
Key Finding: Despite its sophisticated design, the D-CAPTCHA system is vulnerable to transferable imperceptible adversarial attacks. We expose this vulnerability and propose D-CAPTCHA++ with adversarial training to significantly reduce attack success rates.
We identify three main limitations of the original D-CAPTCHA system:
- The Realism module is vulnerable to adversarial examples and can be evaded by adding crafted perturbations to the response audio.
- The Task module cannot truly understand the semantic content of the response — it only classifies audio features.
- The Identity module only compares initial and response audio, leaving it vulnerable if the adversary uses voice conversion both before and during the challenge.
Threat Model
Our threat model integrates a human adversary, voice conversion models, and adversarial example generation to evade the D-CAPTCHA system under a black-box setting.
Integrity Violation
Evade detection by the D-CAPTCHA system without compromising normal system operation. The attacker aims to have fake audio classified as real.
Black-box Access
The adversary knows only the task performed by each module and the decision output. No access to training data, preprocessing, model architecture, parameters, or inference API.
Surrogate Model + Transferability
Train a surrogate model by querying collected data to the target model, then generate imperceptible adversarial samples that transfer to the target model.
Method
Generating Imperceptible Adversarial Examples
The attack optimizes a surrogate model (LCNN) to generate adversarial perturbations that are both effective and imperceptible. The optimization objective combines a network loss $\mathcal{L}_{net}$ (to mislead the detector) with a perceptual loss $\mathcal{L}_{\theta}$ (to ensure imperceptibility via frequency masking):
$$\min \; \mathcal{L}_{net}(\hat{\mathcal{F}}(\mathcal{V}(x) + \delta), y) + \alpha \cdot \mathcal{L}_{\theta}(\mathcal{V}(x), \delta) \quad \text{s.t.} \; ||\delta|| < \epsilon$$
where $\hat{\mathcal{F}}$ is the surrogate deepfake detector, $\mathcal{V}(x)$ is the voice-converted audio, $\delta$ is the perturbation, and $\alpha$ balances the two objectives.
Transferability
Adversarial examples crafted against the surrogate model can transfer to target models trained for the same task, due to overlap in error spaces. We craft adversarial examples that induce misclassification with maximum confidence in the surrogate model, as higher-confidence attacks transfer more successfully.
D-CAPTCHA++: Adversarial Training Defense
To mitigate the vulnerability, we apply Projected Gradient Descent (PGD) adversarial training to both the deepfake detectors and task classifiers. During training, adversarial examples are generated on-the-fly and included in the training set, improving the model's robustness against adversarial perturbations.
Main Results
Voice Conversion Evaluation
We evaluate three voice conversion models. kNN-VC achieves the best balance of fast inference speed and high intelligibility, satisfying the D-CAPTCHA's 1-second response constraint.
| Model | WER (%) | CER (%) |
|---|---|---|
| kNN-VC | 25.78 | 15.67 |
| Urhythmic | 37.12 | 24.68 |
| TriAAN-VC | 19.87 | 11.25 |
Transferability to Target Models
Adversarial samples generated by the LCNN surrogate model successfully transfer to target deepfake detectors. Models sharing the same feature extraction frontend (LFCC) show higher transferability.
| Surrogate | LCNN | SpecRNet | RawNet2 | RawNet3 |
|---|---|---|---|---|
| LCNN | 99.76 | 41.87 | 35.91 | 36.83 |
D-CAPTCHA vs D-CAPTCHA++
PGD adversarial training dramatically reduces the attack success rate for both deepfake detectors and task classifiers.
| Task | D-CAPTCHA (Standard) | D-CAPTCHA++ (PGD 20 steps) | D-CAPTCHA++ (PGD 40 steps) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Detector | ResNet18 | RawNet3 | Detector | ResNet18 | RawNet3 | Detector | ResNet18 | RawNet3 | |
| Sing | 37.16 | 32.57 | 34.28 | 8.03 | 4.77 | 5.13 | 3.06 | 0.67 | 0.91 |
| Hum Tone | 35.93 | 30.16 | 34.58 | 7.47 | 4.05 | 4.64 | 2.62 | 0.58 | 0.77 |
| Speak w/ Emotion | 38.58 | 36.41 | 37.68 | 8.64 | 5.08 | 5.34 | 3.45 | 0.81 | 1.05 |
| Laugh | 32.04 | 26.14 | 28.71 | 7.21 | 3.31 | 3.88 | 2.37 | 0.41 | 0.54 |
| Domestic Sound | 29.76 | 24.75 | 27.83 | 6.87 | 2.56 | 2.91 | 1.85 | 0.21 | 0.38 |
Key Takeaways
Adversarial Training is Essential
Adversarial samples should be created and involved in the training set to improve the generalization and robustness of both deepfake detectors and task classifiers.
Feature Extraction Limits Transferability
Deepfake detectors and task classifiers employing feature extraction techniques (e.g., MFCC, spectrogram) have less vulnerability to transferable adversarial samples than raw waveform models.
Evaluate with Multiple Metrics
When constructing detection-based defenses, report results with F1-score and ROC curve, not just accuracy — especially given dataset imbalance between real and fake audio samples.
Citation
If you find this work useful in your research, please consider citing:
@inproceedings{nguyenle2024dcaptcha,
title = {D-CAPTCHA++: A Study of Resilience of Deepfake
CAPTCHA under Transferable Imperceptible
Adversarial Attack},
author = {Nguyen-Le, Hong-Hanh and Tran, Van-Tuan
and Nguyen, Dinh-Thuc and Le-Khac, Nhien-An},
booktitle = {IEEE International Conference on Cyber
Security and Resilience (CSR)},
year = {2024}
} Acknowledgments
This publication has emanated from research conducted with the financial support of Science Foundation Ireland under Grant number 18/CRT/6183.