BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Why Can't Facebook Do Better At Fact Checking Photos And Videos?

Following
This article is more than 5 years old.

Last week Facebook announced that it was expanding its fact checking efforts to photographs and videos, forwarding visual memes for evaluation by its fact checking partners. What made the announcement so notable was not the details of the initiative, which will still rely on forwarding each image and video for review by its external human fact checkers, or the technologies used, which are minimal, but rather that we are nearly at the end of 2018 in a digital world in which visual media have become a dominate form of online communication and yet Facebook is only just now beginning to tentatively expand its efforts to include them. We have the technologies today to perform completely automated detection of a large fraction of day-to-day false information spread through visual memes, why doesn’t Facebook embrace them?

In its announcement, Facebook identifies three core areas where it is focusing its photo and video fact checking efforts: manipulated images, images republished out of context and superimposed textual and audio claims.

When it comes to identifying manipulated imagery, there are many well established algorithms that can identify common alterations, such as detecting zones of differing compression artifacts to identify areas of an image that were changed or identifying traces left over in the image’s metadata. A good deal of common image editing can be detected through such tools, though obviously nation state misinformation efforts using professional artists or emerging “deep fake” technologies pose greater challenges.

Regardless of the approach used to create a manipulated image, one approach that can work well to identify potential alteration is to perform a reverse image search for highly similar images found anywhere else across the entire open web. Similarly, the image can be processed using either traditional statistical or deep learning algorithms to subdivide the image into key content blocks and separately search for each. This can readily surface that a given image is a copy of a previous wire image except for the addition of two new people in the foreground. If the details of the background match down to the last blade of grass, one can be fairly confident that the image has been digitally manipulated, even if the alteration has been performed by a professional digital forensics expert or has been algorithmically altered as a “deep fake.”

Alternatively, if the image itself is not found elsewhere on the web, subcomponents of the image may match. Take a photograph of a group of people in which the image itself has been so heavily modified that reverse image searches cannot locate the original image. Dividing the image into individual content blocks and performing reverse image searches on each would allow the system to identify that each person in the image was actually cropped from a different previous image down to the reflection in their glasses, again automatically flagging the image as doctored.

Even many “deep fake” images and videos can be detected using such approaches, keying on unique characteristics of the background “base” content into which the synthetic changes were stitched. For completely artificial content created from scratch in which no real elements, including backgrounds were used, reverse image searches can still help identify distinguishing deviations, such as the wrong wallpaper pattern or color in an image purporting to depict an event in a given White House room.

While such techniques are not foolproof, they can readily catch a great deal of the run of the mill Photoshop attempts that dominate a great deal of the falsified imagery in circulation.

Moreover, the technology to robustly perform such reverse image searches across the open web is widely available. Google’s Cloud Vision API allows one to submit an image and have it perform an extremely advanced reverse image search across all the images Google has seen across the open web. Applying it to nearly half a billion news images from across the world through my open data GDELT Project, its algorithms have proven remarkably adept at tracking down the original versions of even heavily cropped and modified images. It also has the ability to annotate the image with bounding boxes around individual objects in the image, which when coupled with traditional “area of interest” or entropy bounding boxes can easily decompose an image into subunits to be individually checked.

What about images taken out of context? These are perhaps the most common form of false information on social media, often shared not in a deliberate attempt to misinform, but rather by misguided or misinformed users who genuinely believe the image depicts something else.

The GDELT Project has approached supporting verification of these images through three parallel tracks.

The first is to scan the image for any embedded metadata that records the date the image was taken, its location, a descriptive caption or even a record locator ID in a major photo database or any other descriptive attributes. Such metadata can be easily forged or removed, but if an image’s metadata states it was taken March 1, 1992 and the image’s caption in a given meme states it depicts an earthquake that occurred two hours ago, it would certainly be worth noting the discrepancy to the user.

The second approach is to use Google Cloud Vision API’s reverse image search to report how often the image has been seen elsewhere in the web in the past and offer links to hundreds of other places on the web where the exact image has been found and hundreds of extremely similar images on the web. Filtering this list to major news outlets, image databases, government websites and other authoritative sites and presenting those links to the user can help lend context to an image. If an image’s caption claims it is the first image to emerge from a chemical weapons attack 2 minutes ago in Syria, yet Cloud Vision API locates thousands of copies of the image on the web dating back 5 years, it is likely that the image is not what it claims.

Presenting the user with other instances of the image’s use across the web allows them to conduct their own verification of the image, comparing how it is described in the current instance from how it has been captioned previously across the web. Even in cases where an image is used correctly, understanding the previous contexts in which it has been used can help with interpreting the image’s significance and whether there are disputes over what it depicts and its deeper meaning.

Instead of merely presenting a long list of URLs to the user of all the other places the image has appeared on the web and forcing them to compare the current description of the image to previous captions used for it, what if the machine could do all of that for us?

Imagine a system that took an image, searched the open web for every other appearance of that image anywhere on the web, extracted the caption used to describe the image in each of those appearances in hundreds of different languages and processed all of them to compute a histogram of the top people, locations, activities, events and other topics most commonly used to describe the image. In essence, you could “crowdsource” how the image has been described across the web and see how closely that description matches the claims made about the image in its current context. Couple this with OCR recognition of any text seen in the background of the image from street signs to store fronts in dozens of languages and you’ve got a powerful tool to verify whether the current use of an image matches how it has been described in the past.

It turns out that Google’s Cloud Vision API does precisely this through its Web Detection and OCR capabilities and GDELT has been exploring how this information can be used entirely automatically as a third approach to autonomous image verification.

Take an image published a few minutes ago that claims to depict a street protest in Syria. Cloud Vision’s Web Detection capability performs a reverse image search, extracts all of the captions from all of those appearances of the image, computes the key entities mentioned in each caption and returns the top entities associated with the image, all in just 1-2 seconds. In this case the API might report that the image has frequently been captioned across the web as Ukraine, Protest, Euromaidan and 2014 Ukrainian Revolution. Comparing these topics against the image’s caption in its current appearance would fairly readily suggest that it is being misrepresented.

Separately, Cloud Vision API’s photographic OCR might identify that a pair of street signs in the background are written in Ukrainian and a separate map lookup might show that that particular intersection exists only in Kiev, placing the final nail in the coffin of the claim that the image was taken in Syria.

Finally, Facebook raises the issue of visual memes in which the text is written directly on top of the image or videos in which claims are spoken in the audio track. Again, extremely robust OCR, audio recognition and video recognition across a large and ever growing number of languages is available through tools like Google’s Cloud Vision API, Cloud Speech-to-Text API and Cloud Video Intelligence API, coupled with Cloud Translation API to connect appearances of a meme across languages.

In short, using Google’s existing off the shelf APIs one could build in just a single afternoon a completely autonomous state-of-the-art image and video fact checking system that addresses all three of Facebook’s target areas: manipulated images, mislabeled or out of context images and text and audio claims. Indeed, the GDELT Project has been applying all of these visual analytic approaches to worldwide news imagery for more than two and a half years and performed reverse image search, OCR, image caption analysis, machine translation, metadata assessment and other analyses to nearly half a billion images resulting in more than 321 billion datapoints that can be used to authenticate and verify an image’s origins, context and description and has found these attributes to be extremely useful for such verification.

This raises the question of why Facebook doesn’t utilize a similar workflow for image verification? Such automatic screening could help it identify fact checking candidates that have not been reported by any user and which are not spreading in ways that their behavioral models might flag.

It can take fact checkers considerable time, from hours to days to fully vet a given piece of content, meaning an image or video can spread virally and cause considerable damage before it is debunked. Using the kinds of approaches outlined above, Facebook could easily flag images and videos that have a high likelihood of being modified, miscaptioned or contain a false textual or audio narrative. The content would be automatically forwarded to Facebook’s human fact checking partners, but in the interim, for the hours or days until the content is reviewed, Facebook could append a notice to the post stating that its automated algorithms have raised questions about the post and offering links to other appearances of the image in major news outlets or other authoritative sources on the web to allow users to see alternative context and descriptions for it. If a fact checker eventually decides to review the post, they would not be starting from scratch, but instead would have access to all of the information above at their fingertips to give them a wealth of information to help make a decision, with the final verdict being appended to the image.

Being able to flag an image as questionable the instant it is uploaded and append links to other authoritative appearances of the image or its subcomponents online and tying its textual and audio narratives into related fact checks and content would go a long way towards preventing false information from spreading virally in those critical first hours before a fact checker can review it and offering context for content that fact checkers never get around to reviewing, given the nearly infinite amount of false content that circulates online.

Having processed nearly half a billion global news images through Google’s tools and tested many of these approaches at scale, it is certainly clear that precisely such a fully autonomous and extremely robust image and video fact checking system could be deployed today.

This raises the critical question of why Facebook isn’t making use of these kinds of technologies, given that it itself has considerable AI experience to build such tools. When asked for comment, a Facebook spokesperson stated that the company had been focusing on photos and videos over the past few months, building machine learning models for them, learning how images and video are used to spread false information on its platform and expanding from a single pilot image and video fact checking partner to 9 partners to broader availability. When asked why it was only just now focusing on this content over the past few months instead of years ago as images and videos dominated its platform and especially as Facebook itself prioritized image and video content on its platform, the company did not comment. Similarly, when asked why it is primarily relying on human evaluation of images, why it is not providing links to other appearances of questionable images and why it is not making broader use of these kinds of technologies, the company again declined to comment other than to confirm that it is focusing internally on tools to detect manipulated images but leaving the evaluation of those and all remaining cases to human fact checkers.

Putting this all together, we have the tools today to readily combat a large fraction of casual misinformation spread on social media using off the shelf tools that could be plugged together in a single afternoon. Having processed nearly half a billion images through Google’s tools over the past two and a half years through these very pipelines, I can personally attest that the technologies are sufficiently accurate, robust and scalable to launch such a mass verification system today. Facebook won’t comment on why it is only just now beginning the first steps of looking into visual misinformation and why it isn’t adopting these kinds of automated tools and its nascent pilot work simply offloads all of the work to third party human fact checkers that do not have the time or resources to review even a fraction of the misinformation that spreads each day. In the end, it seems we are unlikely to see major investments from the social media companies in countering misinformation until governments step in as they did with hate speech and forcibly compel them to adopt these kinds of approaches. Until then, it seems advances in the fight against misinformation will have to come from outside the socials themselves.