Cookie Preferences

By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Close icon
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Unlocking Safe Digital Spaces with Computer Vision + Audio and Language Processing 

Learn what computer vision and natural language processing are, and how they can be used together to help platforms better moderate content online. Discover why Unitary's AI-powered technology is the ultimate solution for online safety.


Table of contents

The 'eye' of computer vision. Image via Adobe Stock.

As regulations tighten and expectations shift, platforms are under increasing pressure to better moderate content online. Fortunately, technological advances are helping to close the gap, and computer vision coupled with natural language processing currently represents the cutting edge of moderation capabilities.

What is computer vision (and why does it matter?)

The human eye (and the brain which powers it) are biological marvels. It’s no surprise that scientists have been trying to reproduce its capabilities for decades.

Computer vision technology is the sub-discipline of artificial intelligence concerned with allowing computers to ‘see’ and process the environment (or a specific situation) in the same way that a human does. A camera mimics the human eye, while software algorithms help the system ‘understand’ what it is seeing on a pixel-by-pixel level.

In this way, computer vision can be adapted for many tasks – allowing self-driving cars to recognise and act on traffic conditions or facial recognition for instance. Importantly, computer vision can also be applied to content moderation.

Content moderation and computer vision

Effective automated content moderation relies on the ability to process multiple factors simultaneously. Unimodal algorithms are quite effective at identifying questionable content (nudity, violence, gore etc) in pictures and text, but they tend to fail when processing more complex media, like video.

Computer vision technology brings some additional capabilities to content moderation, such as object recognition and classification – and it can do so for video content too. However, when considered in isolation, these capabilities are still not accurate enough.

Which is why natural language processing is another essential aspect of successfully protecting online communities.

Adding natural language processing into the mix

Where computer vision tries to make sense of what the computer ‘sees’, natural language processing (NLP) is focused on words. NLP allows the computer to make sense of spoken or written words – and to assess that content too.

Combining NLP and computer vision technologies will significantly strengthen content moderation capabilities. The moderating algorithm can simultaneously analyse a video, processing visual content (computer vision) and any written or spoken words (NLP). This goes beyond simply flagging/blocking rule-breaking content however – it allows the algorithm to make smarter choices based on context.

Adding context into content moderation is essential for avoiding false positives. It is very easy to censor videos because a single algorithmic rule has been triggered; by considering flagged elements in the context of the video as a whole.

More than just ‘written’ text

It’s important to realise that NLP involves more than just textual analysis. NLP can be applied to spoken word too, allowing moderation of the audio soundtrack accompanying a video. Advanced algorithms can also accurately determine unspoken factors like intent and sentiment by carefully analysing the tone of words and the way in which they are used.

Unspoken cues cannot be addressed using visual processing techniques – but NLP can analyse content to make an informed categorisation. When combined with computer vision, content moderation becomes more holistic – and more able to closely match the capabilities of a human moderator.

Here at Unitary, our content moderation platform uses multimodal algorithms built around NLP and computer vision – which is how we can process more than three billion images every day with human-level accuracy. To learn more about Unitary and how this combination of technologies is essential to protecting your platforms and brands online, please contact us to arrange a demo.