Could Automated Claim Extraction And Tracking Combat Digital Falsehoods?

Getty Images

One of the greatest challenges facing fact checkers today is that humans are simply no match for the real-time tsunami of misinformation, disinformation, digital falsehoods and foreign influence that washes across the social landscape each day. No amount of human fact checkers could possibly keep pace with all of the false and misleading information published every minute. Could machines help?

The process of fact checking involves identifying core claims, researching each of those claims and determining a final verdict. In many cases there is no single definitive inarguable answer: the veracity of the claim lies in the eye of the beholder.

Was Obama really the greatest president in history? Did he do “enough” to push back on Russian electoral interference? Has Merkel done “all she can” for refugees? None of these have indisputable answers based on fact: they are opinions. Yet they are the kinds of claims that fact checkers frequently examine based on viral claims made about them.

Similarly, fact checkers rarely have access to the authoritative evidence they need to prove or disprove a claim. Instead, they typically rely on the very same public records and media coverage the public does or consult “experts” that are not always right.

Even when a claim is relatively simple to verify and the evidence is available (did Biden actually say the statement attributed to him in a televised campaign event yesterday?), it still takes considerable time for a fact checker to identify the event where the statement appears to have been sourced from, review the full transcript, confirm the precise wording and write it up as a fact check for publication.

Moreover, by the time human fact checkers have reviewed a claim and published their response, the public has typically moved on to a new claim and the damage is done.

Could machines offer a solution?

Imagine machines that monitored social media in real-time, examining every tweet, every Facebook post, every Instagram caption and extracted the core arguments and claims being made within. Each claim would be compiled down to a set of keyword searches of the open Web, news coverage, reference works like Wikipedia, government and NGO Websites and other sources.

Each social media claim would be compared against these external sources to see how it lines up. In cases of factual claims (was Lincoln really the source of that quote or was unemployment really that low last quarter), the machine could identify clear discrepancies. In cases where claims are fairly well aligned, the algorithm could flag that as well.

This could result in an annotation beside each social post that either displays a green check mark indicating that the claims appear to match external sources, while a yellow caution sign could indicate that the claim deviates from authoritative sources. Automated summarization systems could autonomously summarize the differences into a paragraph of machine-generated prose.

A deviation would not itself indicate that the claim is false, since it could represent new first-person information that contradicts previous reporting. For example, the person a quote is attributed to could confirm it, a witness could post a video verifying an event or new official details could replace old tentative information in a breaking news event.

Rather, by flagging that there is a deviation, it can help consumers of a post at least be aware that there is a potential discrepancy.

Discrepancies would also be forwarded to human fact checkers, prioritized by the kind of claim, the magnitude of the deviation and its impact on current events. A misattributed inspirational quote from a historical figure from 500 years ago might be given lower priority than a false quote attributed to a current head of state involving an impending election that takes place in days.

For the myriad claims for which “truth” lies in the eye of the beholder, the system would arrange a continuum of perspectives beneath the post. A claim that “Obama is the greatest president in history” would result in a bar graph beneath showing the full range of perspectives on that claim, with the strongest disagreements on the left ranging through neutral arguments in the center and the strongest agreements on the right.

In fact, the open data GDELT Project’s “Tone Chart” does precisely this for keyword searches of global online news coverage, showing the full range of global perspectives on an issue.

A user that has already made up their mind on a topic is unlikely to be swayed by such a graph, though it may help broaden their perspective by showing them weak points in their arguments. To the vast majority of people that have not necessarily hardened their perspective, it could help them see the issue more holistically and perhaps shift their thinking or offer them additional evidence and perspectives to consider.

Since such a system would be entirely algorithmic, it would have no trouble keeping pace with the fire hose of social media content, ensuring viral falsehoods are flagged from the moment the first post appears and potentially mitigating its impact.

Of course, creating such a system would be far from trivial.

Human language is infinitely creative, with an almost limitless number of ways of saying the same thing. Current natural language systems are extremely primitive and brittle, with even the most advanced state-of-the-art systems failing at the slightest typo, grammatical error or unexpected prose.

Machine translation is still quite error-prone, making precision comparison of claims across languages more problematic.

Despite these challenges, the building blocks for such a system are all there. Early experiments by the GDELT Project across global news content suggest such a system is entirely feasible and with refinement could prove a powerful asset in combatting the spread of digital falsehoods.

Putting this all together, all of the building blocks exist for us to augment human fact checkers with algorithmic counterparts. These automated workflows could automatically compare the fire hose of social posts against authoritative sources in real-time, flagging deviations and exposing users to diverse perspectives and evidence for claims without definitive answers.

The tools are here and with a bit of refinement such a system could be launched today.

In the end, the only question is whether it would actually make a difference. Do we care about truth anymore in our increasingly divided world? Or is “truth” today defined by what we agree with?

Only time will tell.

More From Forbes

Could Automated Claim Extraction And Tracking Combat Digital Falsehoods?