Deepfake Detection Platforms Evaluation
First cross-paradigm evaluation of six publicly accessible deepfake detection tools, including forensic analysis tools and AI-based classifiers
Python ยท active
A comparative evaluation framework for publicly accessible deepfake detection tools, assessing both forensic analysis tools (InVID & WeVerify, FotoForensics, Forensically) and AI-based classifiers (DecopyAI, FaceOnLive, Bitmind). The evaluation was conducted by professional investigators with law enforcement experience using blinded protocols across DF40, CelebDF, and CASIA-v2 datasets.
Key findings:
- Forensic tools exhibit high recall but poor specificity
- AI classifiers demonstrate the inverse pattern (high specificity, lower recall)
- Human evaluators substantially outperform all automated tools
- Human-AI disagreement is asymmetric, with human judgment prevailing in most discordant cases