D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial Attack

IJCNN 2024

Hong-Hanh Nguyen-Le, Van-Tuan Tran, Dinh-Thuc Nguyen, Nhien-An Le-Khac

University College Dublin · Trinity College Dublin · University of Science, HCMC, Vietnam

Abstract

The advancements in generative AI have enabled the improvement of audio synthesis models, including text-to-speech and voice conversion. This raises concerns about its potential misuse in social manipulation and political interference, as synthetic speech has become indistinguishable from natural human speech. Several speech-generation programs are utilized for malicious purposes, especially impersonating individuals through phone calls. Therefore, detecting fake audio is crucial to maintain social security and safeguard the integrity of information.

Recent research has proposed a D-CAPTCHA system based on the challenge-response protocol to differentiate fake phone calls from real ones. In this work, we study the resilience of this system and introduce a more robust version, D-CAPTCHA++, to defend against fake calls. Specifically, we first expose the vulnerability of the D-CAPTCHA system under transferable imperceptible adversarial attack. Secondly, we mitigate such vulnerability by improving the robustness of the system by using adversarial training in D-CAPTCHA deepfake detectors and task classifiers.

Problem & Motivation

The D-CAPTCHA system is a defense against deepfake calls through a challenge-response protocol. It integrates five modules to verify the caller's authenticity:

Module

The D-CAPTCHA System

Human-based: Assigns a random challenge to suspicious callers. Time: Constrains the response within 1 second. Realism: Deepfake detectors verify whether the response is spoofed. Task: ML classifiers verify the response contains the requested content. Identity: Evaluates speaker similarity between initial and response audio.

Key Finding: Despite its sophisticated design, the D-CAPTCHA system is vulnerable to transferable imperceptible adversarial attacks. We expose this vulnerability and propose D-CAPTCHA++ with adversarial training to significantly reduce attack success rates.

We identify three main limitations of the original D-CAPTCHA system:

The Realism module is vulnerable to adversarial examples and can be evaded by adding crafted perturbations to the response audio.
The Task module cannot truly understand the semantic content of the response — it only classifies audio features.
The Identity module only compares initial and response audio, leaving it vulnerable if the adversary uses voice conversion both before and during the challenge.

Threat Model

Our threat model integrates a human adversary, voice conversion models, and adversarial example generation to evade the D-CAPTCHA system under a black-box setting.

Attacker's Goal

Integrity Violation

Evade detection by the D-CAPTCHA system without compromising normal system operation. The attacker aims to have fake audio classified as real.

Attacker's Knowledge

Black-box Access

The adversary knows only the task performed by each module and the decision output. No access to training data, preprocessing, model architecture, parameters, or inference API.

Attacker's Strategy

Surrogate Model + Transferability

Train a surrogate model by querying collected data to the target model, then generate imperceptible adversarial samples that transfer to the target model.

Method

Generating Imperceptible Adversarial Examples

The attack optimizes a surrogate model (LCNN) to generate adversarial perturbations that are both effective and imperceptible. The optimization objective combines a network loss $\mathcal{L}_{net}$ (to mislead the detector) with a perceptual loss $\mathcal{L}_{\theta}$ (to ensure imperceptibility via frequency masking):

$$\min \; \mathcal{L}_{net}(\hat{\mathcal{F}}(\mathcal{V}(x) + \delta), y) + \alpha \cdot \mathcal{L}_{\theta}(\mathcal{V}(x), \delta) \quad \text{s.t.} \; ||\delta|| < \epsilon$$

where $\hat{\mathcal{F}}$ is the surrogate deepfake detector, $\mathcal{V}(x)$ is the voice-converted audio, $\delta$ is the perturbation, and $\alpha$ balances the two objectives.

Transferability

Adversarial examples crafted against the surrogate model can transfer to target models trained for the same task, due to overlap in error spaces. We craft adversarial examples that induce misclassification with maximum confidence in the surrogate model, as higher-confidence attacks transfer more successfully.

D-CAPTCHA++: Adversarial Training Defense

To mitigate the vulnerability, we apply Projected Gradient Descent (PGD) adversarial training to both the deepfake detectors and task classifiers. During training, adversarial examples are generated on-the-fly and included in the training set, improving the model's robustness against adversarial perturbations.

Main Results

Voice Conversion Evaluation

We evaluate three voice conversion models. kNN-VC achieves the best balance of fast inference speed and high intelligibility, satisfying the D-CAPTCHA's 1-second response constraint.

Table 1 · Comparison of Voice Conversion Intelligibility

Model	WER (%)	CER (%)
kNN-VC	25.78	15.67
Urhythmic	37.12	24.68
TriAAN-VC	19.87	11.25

Transferability to Target Models

Adversarial samples generated by the LCNN surrogate model successfully transfer to target deepfake detectors. Models sharing the same feature extraction frontend (LFCC) show higher transferability.

Table 2 · Attack Success Rate (%) of Transferability from Surrogate to Target Deepfake Detectors

Surrogate	LCNN	SpecRNet	RawNet2	RawNet3
LCNN	99.76	41.87	35.91	36.83

SpecRNet (same LFCC frontend as LCNN) shows highest transferability. Raw waveform models (RawNet2/3) are more robust to transferred attacks.

D-CAPTCHA vs D-CAPTCHA++

PGD adversarial training dramatically reduces the attack success rate for both deepfake detectors and task classifiers.

Table 3 · Attack Success Rate (%) — D-CAPTCHA vs D-CAPTCHA++ (SpecRNet detector)

Task	D-CAPTCHA (Standard)			D-CAPTCHA++ (PGD 20 steps)			D-CAPTCHA++ (PGD 40 steps)
Task	Detector	ResNet18	RawNet3	Detector	ResNet18	RawNet3	Detector	ResNet18	RawNet3
Sing	37.16	32.57	34.28	8.03	4.77	5.13	3.06	0.67	0.91
Hum Tone	35.93	30.16	34.58	7.47	4.05	4.64	2.62	0.58	0.77
Speak w/ Emotion	38.58	36.41	37.68	8.64	5.08	5.34	3.45	0.81	1.05
Laugh	32.04	26.14	28.71	7.21	3.31	3.88	2.37	0.41	0.54
Domestic Sound	29.76	24.75	27.83	6.87	2.56	2.91	1.85	0.21	0.38

D-CAPTCHA++ with PGD 40 steps reduces deepfake detector ASR from 32.26% to 2.27% and task classifier ASR from 31.31% to 0.60% on average.

Key Takeaways

Recommendation 1

Adversarial Training is Essential

Adversarial samples should be created and involved in the training set to improve the generalization and robustness of both deepfake detectors and task classifiers.

Recommendation 2

Feature Extraction Limits Transferability

Deepfake detectors and task classifiers employing feature extraction techniques (e.g., MFCC, spectrogram) have less vulnerability to transferable adversarial samples than raw waveform models.

Recommendation 3

Evaluate with Multiple Metrics

When constructing detection-based defenses, report results with F1-score and ROC curve, not just accuracy — especially given dataset imbalance between real and fake audio samples.

Citation

If you find this work useful in your research, please consider citing:

@inproceedings{nguyenle2024dcaptcha,
  title     = {D-CAPTCHA++: A Study of Resilience of Deepfake
               CAPTCHA under Transferable Imperceptible
               Adversarial Attack},
  author    = {Nguyen-Le, Hong-Hanh and Tran, Van-Tuan
               and Nguyen, Dinh-Thuc and Le-Khac, Nhien-An},
  booktitle = {IEEE International Conference on Cyber
               Security and Resilience (CSR)},
  year      = {2024}
}

Acknowledgments

This publication has emanated from research conducted with the financial support of Science Foundation Ireland under Grant number 18/CRT/6183.