BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Moving Beyond Monolithic Automated Fact Checking And Context Algorithms

Following
This article is more than 5 years old.

Getty

As livestreams of the Notre Dame fire proliferated yesterday, a new wave of automated fact checking filters designed to combat misinformation were confronted with their first live tests. In this case, at least one major system incorrectly added links to the 9/11 terrorist attacks in the US to livestreams of the French cathedral fire. As automated fact checking and contextualization systems proliferate, how can companies ensure they don’t do more harm than good?

One of the greatest challenges in contextualizing any breaking story is triangulating the chaotic and conflicting informational environment around the story to settle on a basic set of uncontested details around which to tell the story. This can be exceptionally hard for humans, let alone naïve and primitive computer algorithms that rely on simple statistical analysis to surface details largely through small training datasets, hardcoded rules and trending topics, rather than on any deeper understanding of the details themselves.

In the case of the Notre Dame fire, one could easily imagine automated “fake news” algorithms initially flagging the first livestreams as false computer-generated imagery, since it would seem so unlikely to have such an iconic landmark engulfed in inferno.

Paradoxically, these are precisely the moments when unexpected content is so valuable, in capturing the earliest glimmers of stories that are out of the blue.

How then is a machine supposed to distinguish between the first videos emerging from a major disaster versus a coordinated misinformation campaign posting false content designed to sow panic and distress?

After all, even media outlets can be hacked, as the Associated Press reminded us in 2013 when its Twitter account was compromised to tweet that the White House had been attacked and the president injured, which caused automated Wall Street trading systems to react before the false information could be retracted.

Combatting misinformation through automated systems requires a multilayered approach that combines analyzing context, providing context and looking across the entire informational ecosystem.

Imagine a misinformation filtering system confronted with a video posted to Twitter from an anonymous account purporting to show an internationally famous landmark on fire, with no other corroborating information to verify it. The most reasonable path for that algorithm to take is to place a warning notice at the bottom indicating that the video appears to depict events for which it is currently the only source and thus cannot yet be verified.

Retweets of the video by official sources likely to have knowledge of the events, such as police, fire fighters and other first responders that confirm the video as real might be noted by adding a notice that official sources are increasingly confirming the events.

As more and more videos emerge and news coverage provides independent reporting, the video might contain additional notices that the events it depicts have now been verified by multiple official sources, journalists on the scene and captured by multiple independent video streams. All of the videos depicting the same events could be easily grouped together, offering viewers greater context of the events leading up to the situation, as well as other perspectives and angles that might tell a slightly different story.

Of course, such a system could still be gamed by a coordinated misinformation campaign.

In today’s world, the cost and complexity of producing synthetic video content narrows its use to a relatively small portion of the population. However, as the tools have become easier to use, it is quite conceivable that a malicious actor could generate thousands of fake videos depicting a false event from a vast array of perspectives and then flood these out to social media in rapid succession, as if thousands of people standing on all sides of the event are filming it from different locations. The fake videos could even be designed to capture the other “witnesses” filming on their phones from the correct locations, further adding to the realism of the false scene.

Even as official accounts deny the events, the ever-growing avalanche of fake livestreams from the scene would sow mass chaos and panic as citizens assume the government is attempting to hide the events or downplay them. The public’s familiarity with the long history of governments initially denying major stories to avoid panic will only reinforce this conflict. Moreover, a few hacked official Twitter accounts confirming the story would be all that’s needed to send a society into mass panic.

Such false stories could even create a self-fulfilling prophesy in which a storyline causes citizens to panic and rush to their banks to withdraw funds, grocery stores to stock up on staples, gas stations to fill up their tanks or other actions, creating very real mass shortages that are them amplified back to the public, causing the shortages to worsen and in turn creating a new and very real story of mass chaos.

Training systems of the future to combat such coordinated campaigns requires far more than simply handing a deep learning algorithm a library of videos downloaded from the web and hoping for the best. It requires building new forms of algorithms that can reason across vast landscapes of chaotic and conflicting signals, much like real journalists do.

Such coordinated and sophisticated misinformation campaigns would be difficult to execute successfully, but a single motivated and skilled individual could likely pull off such an event with proper planning even today.

Setting aside such coordinated campaigns, the reality is that most of the mundane false information such algorithms must confront are misleading narratives stemming from selective filming, well-meaning misattributions of past events or the addition of false details and incorrect interpretations to events.

In such cases, grouping together all of the videos, imagery and narratives about an event into a single place can help lend the additional perspective necessary to wade through the thicket of bad information.

Instead of a single video defining an event for all, dozens, hundreds or even thousands of distinct videos and images and can be woven together by algorithmic historians to create a more holistic and contextualized timeline of the event, helping to debunk false and misleading narratives.

In the case of the Notre Dame livestreams, it is understandable why a computer algorithm might have connected them to 9/11. A massive iconic structure collapsing and on fire would find few previous examples in the algorithm’s limited training data other than major terror attacks.

Instead, what if those algorithms had focused on the location of the videos, either through their user-provided GPS or cellular triangulation location or through visual geocoding of the video itself? Instead of providing 9/11 as a contextual hint, the algorithms could have provided links to information about the Notre Dame cathedral and links to breaking news about the landmark, which would have rapidly surfaced breaking reports of the fire. Pooling all of the emerging video would also allow the algorithm to see that there was a rapid surge in video from the scene all depicting largely the same events from the myriad different angles that would be expected of a real event, coupled with announcements from local authorities confirming the situation.

Putting this all together, automated fact checking and contextualization algorithms have a lot of potential, but instead of relying on simplistic machine learning algorithms, their creators need to focus more on the kinds of journalistic signals the media itself relies on, triangulating across the informational environment and triaging all of the conflicting indicators to produce a holistic understanding of the event.

In the end, perhaps the biggest lesson is once again that industry needs to look beyond monolithic algorithms and think more creatively about all of the ways their algorithms can go wrong. After all, as history teaches us, what can go wrong will go wrong and when it comes to the platforms’ increasing dominance over our informational lives, in the future those errors may have devastating consequences.