The rise of deepfakes and AI-generated media has ignited a search for reliable methods to detect synthetic, manipulated, and wholly generated content with the intent to mitigate its harmful effects on society. In recent years, two primary approaches have emerged: provenance-based detection, which examines metadata for signs of manipulation, and inference-based detection, which analyzes the media itself for artifacts and inconsistencies.
Both methods are wholly different from each other, with different mechanisms of action, adoption, and efficacy. To better understand how they work and their strengths, it is essential to examine each approach in depth.
Provenance: Watermarking and Beyond
Content provenance is one of the most widely publicized methods in the identification and labeling of deepfakes and other AI-generated media. Endorsed by major tech companies, provenance-based deepfake detection examines the metadata of the content in question, looking for information such as timestamps, editing history, and GPS coordinates. Inconsistencies within this metadata can point to possible AI manipulation. Going a step further, companies and institutions are proposing to watermark digital content, meaning that any content created or altered with the help of AI would carry a metadata stamp to alert consumers.
Watermarking was the focus of the Biden Administration’s executive order regarding the safety of AI released in November of 2023. The Coalition for Content Provenance and Authenticity (C2PA), a partnership between tech and media companies such as Google, Adobe, and BBC, was created to establish a standard for content authenticity and provenance of digital content via watermarking. The recent EU Artificial Intelligence Act also mandates the labeling of artificial or altered content.
While these initiatives are well-intentioned and have their place in the process of identifying deepfakes, provenance-based detection and watermarking are not sufficient to keep up with the proliferation of AI-manipulated media. First, the method requires a standardized implementation across industries, with voluntary participation from digital content platforms and AI-creation companies. Given the sheer number and competing interests of such companies, getting all of them to opt in and create a reliable standard for watermarks across platforms is practically impossible.
Perhaps a bigger problem is the vulnerability of the watermarking method. Experts conclude that watermarks can be easily removed or misused by malicious actors. Engineers of disinformation can hijack digital watermarking metadata to label an authentic piece of content as a deepfake, further eroding our trust in what is real. If a watermark can be easily counterfeited, added, or eliminated — services companies already offer for a fee — it cannot be relied upon as a primary mitigation system to shield users from deepfakes.
Additionally, the provenance method requires access to ground truth, i.e. the verifiably unaltered version of the deepfake media. This serves as the baseline against which the manipulated content is compared. Obtaining reliable ground truth data for deepfake detection tasks requires access to a range of authentic media samples, which can be difficult to obtain. Ground truth data also needs to be meticulously labeled or annotated to indicate which portions of the media have been manipulated and to what extent. Such perfect conditions are unlikely when a piece of content travels far across the digital landscape, and its metadata can be altered or lost easily through methods as simple as taking a screenshot.
Inference: Platform-Agnostic Deepfake Detection
Reality Defender’s award-winning approach to deepfake detection is inference-based. Inference methods focus on detecting subtle artifacts or inconsistencies within content that are indicative of manipulation or synthetic generation. This includes (but is not limited to) visual artifacts, unnatural voice patterns in audio recordings, unusual body movements, and distortions in facial expressions. Given how quickly ground truth and original metadata of synthesized content can be scrubbed or lost, inference does not rely on the chance of obtaining the original, untouched media for comparison. Instead, the focus of detection tools like Reality Defender is solely on the suspicious media file in question and all the ways in which detection models can capture irregularities indicative of AI-fueled deception.
Given the absence of ground truth, we acknowledge the variability of AI-generated media and the detection process by assigning the analyzed media a score ranging from 1 to 99, a number that expresses our confidence in whether the content is a deepfake. Our method requires no opt-ins from AI-generation companies, nor can it be sidestepped with faked metadata.
A Combined Approach
Provenance methods can complement inference methods to provide a fuller analysis of potentially fake content. For example, a platform can use provenance methods to study metadata and identify watermarking to quickly flag the common AI generated media that has not been changed beyond the method’s reach. The inference method can then corroborate or negate these findings by providing an all-encompassing scan of the media in question, searching for traces of manipulation that cannot be hidden, thus eliminating our reliance on easily altered digital labels and commitments from companies and individuals with conflicting interests.
While both provenance-based and inference-based deepfake detection methods have their merits, and could work together to create a “swiss cheese” approach to security and detection, inference-based approaches like those employed by Reality Defender offer a more robust and adaptable solution in the face of rapidly evolving AI technologies. As AI-generated content becomes increasingly sophisticated, inference-based methods will play a crucial role in maintaining the integrity of our digital information ecosystem, ensuring that we can navigate the complexities of the AI era without waiting for platforms and technologies to separate real from fake on their own.