Deepfake Detection Platforms Evaluation

First cross-paradigm evaluation of six publicly accessible deepfake detection tools, including forensic analysis tools and AI-based classifiers

A comparative evaluation framework for publicly accessible deepfake detection tools, assessing both forensic analysis tools (InVID & WeVerify, FotoForensics, Forensically) and AI-based classifiers (DecopyAI, FaceOnLive, Bitmind). The evaluation was conducted by professional investigators with law enforcement experience using blinded protocols across DF40, CelebDF, and CASIA-v2 datasets.

Key findings:

Forensic tools exhibit high recall but poor specificity
AI classifiers demonstrate the inverse pattern (high specificity, lower recall)
Human evaluators substantially outperform all automated tools
Human-AI disagreement is asymmetric, with human judgment prevailing in most discordant cases