Could Compliance Scoring Replace Content Moderation?

Getty Images

Modern social media content moderation is based upon a binary system of either approving a post as allowable under a site’s acceptable speech guidelines or rejecting it in its entirety as a violation. Posts which are just barely a violation are deleted just as permanently as the most egregious posts. What if every social media post had a publicly visible “compliance score” that showed how well it comported with the platform’s rules and this was connected to a set of public dashboards, badges and algorithmic and interface components that rewarded positive conversational behavior and penalized harmful behavior in a form of gamification of the moderation process? Could such an approach actually replace content moderation entirely and lead to far more thoughtful and engaging online discourse?

Content moderation is conducted today in secrecy, enforcing secret rules interpreted by secret moderators with outcomes visible only when they result in a piece of content being removed. Decisions are by definition binary: either the content is left as-is or it is removed. Users receive little guidance from companies if a post is right on the edge of acceptability, depriving them of critical feedback they could use to improve their online behavior.

Most importantly, it creates a situation in which toxic users are free to wreak havoc until they moment they are silenced.

What if instead gamification was used to create a continuum of penalties and rewards that nudged users towards more productive and positive online behaviors, while decreasing their visibility and ability to harm in response to negative and destructive behaviors?

Imagine a system that replaced binary leave/remove moderation decisions with a public compliance score showing how close the post came to being removed. The lower the compliance score, the closer a post was to violating the rules. Today such distinctions are meaningless, but what if such scores were used to control the virality of a post?

The further a post strays from acceptable speech guidelines, the more its virality could be penalized.

What does it mean to “stray” from acceptable speech? One option would be to compute a set of emotional, linguistic and narrative scores for each post and compare them to a platform-wide baseline and to the likely readership of the post. Messages whose scores are considerably more negative than the baseline would be penalized with lower virality and visibility, with the magnitude of this decrease based on the magnitude of their difference from the conversational baseline. This penalty would be clearly listed on the post both sender and recipients are aware that it has been penalized and the reason for doing so.

Using baselines would permit different standards across the world and across communities, respecting cultural sensitivities in a way today’s neocolonialist global speech standards do not.

Publicizing these scores and the rationales behind penalties would address one of the most lingering concerns about today’s moderation process: transparency.

In contrast, posts that exceed the baseline conversational metrics would receive a slight boost in viewership. While this would not by itself be sufficient to make a post go viral, such a boost would ensure that well-written, polite, clinical posts are in greater circulation than the profanity-laden emotional diatribes that practically define virality today.

Such a system would create a real-time continuum of rewards and penalties that automatically adapts to changing societal norms and standards by virtue of its population-scale baseline comparisons.

As profanity becomes more accepted by society as a whole, the presence of such words would no longer result in a penalty. Similarly, if a particular harmful term is co-opted by its traditional target demographic and reshaped into a term of empowerment by that community, its use by that community would result in it having no penalizing effect by those community members, but still having a penalizing effect for users who use it in a harmful fashion.

This self-balancing system would dynamically reduce the visibility of harmful speech that does not quite rise to the level of a violation, while prioritizing speech that encourages productive and informed debate, regardless of substance.

In many ways this digital proctor would act like a schoolteacher, nudging users towards behaviors most productive to society at large and gently steering them away from harmful behaviors, while reducing the impact of those users who simply refuse to conform to societal norms.

Putting this all together, replacing the binary keep/remove decisions of today’s content moderation with a dynamic public compliance scoring system that applies the same gamification and behavioral engineering that has become so popular across Silicon Valley would offer a powerful and novel approach to tackling digital toxicity.

Perhaps the answer to ridding the Web of toxic speech is not to simply follow after toxic users cleaning up their messes after the fact, but rather to take a chapter from the real world’s educational system, gently nudging users from the very beginning towards the most productive and constructive behaviors.

In the end, perhaps the answer to the digital world’s troubles once again lies in turning to the solutions of the physical world.

More From Forbes

Could Compliance Scoring Replace Content Moderation?