Consumption of digital media has increased exponentially and will continue to do so. With OTT streaming, content providers are able to reach a wide range of audiences across the world. This new reality has made it imperative for content providers to ensure compliance with geo-specific local rules and regulations related to violence, nudity, sexual content, strong language, and more.
Given the sheer volume of content that media companies need to create and distribute, categorising and classifying these can be quite challenging. Operators need an efficient way to comply with requirements for different regions and audiences. Manual classification of media content is too time-consuming, inefficient, and expensive.
This article will explore the next generation of QC checks for video streams, which involve an in-depth semantic understanding of content by identifying and extracting objects, events, actions, scenes, and spoken or visual words from streamed content and using them for specific purposes, such as content classification, indexing and retrieval, and automatic generation of content description and captions. The article will, in particular, focus on why Artificial Intelligence (AI) and Machine Learning (ML) are key drivers in meeting this content challenge faced by media service providers.
Improving the classification process
Using ML and AI technology as part of a software-based QC solution, operators can quickly analyse content for large volumes of files. Being able to look at the flagged content along with the relevant metadata can dramatically expedite the process of content readiness. The metadata aspect is important because it provides operators with relevant information, such as the description, exact start and end location in the file, as well as the duration of the flagged content.
Challenges an object detection system faces during classification
Ideally, the QC system should provide specific scene and object detection, along with the ability to extract keywords in spoken audio and detect strong language. It also should allow operators to create user-defined rule sets to automatically check identification of content against regulations in different countries, regions, and organisations. Taking into account the user-defined rules, audio-visual content is classified based on different categories (like nudity, violence, alcohol, smoking, strong language use). The classification results are then published to the broadcasters in the form of reports, which include pertinent information such as the content description, scene/frame wise classification results and compliance reports for different content rating systems.
How ML and AI are transforming content classification
Recently, QC systems have emerged that use AI and ML technologies, TensorFlow framework and state-of-the-art computer vision algorithms to automatically label and categorise both linear and on-demand content.
Why are these innovative technologies needed? Content classification is an intricate process. In a traditional content classification system that relied on deep learning, the features extracted from images for content categorisation purposes were made by humans. Recent advancements in ML and DL (Deep Learning) have automated this process, bringing a high level of accuracy and speed to classification of large volumes of video files that was not possible before.
Now, with ML and AI at hand, operators can go far beyond just detecting objects in video content. They can recognise activity in a video frame, onscreen visual text, audio events, and whether captions are aligned correctly. Identifying explicit content is one area, in particular, where ML and AI can be advantageous in the media environment. Through a combination of object detection, activity recognition, and audio and visual cues operators can determine whether nudity or minimal covering, mild sexual situations, or explicit sexual situations are occurring in the video scene.
Moreover, operators can identify violence through activity recognition and object detection. ML can instantly detect the presence of guns, killing, and car crashes. In areas of the world where visuals of alcohol and smoking in video content is prohibited, operators can use object detection ML technology to identify alcohol as well as cigarettes, cigars and other vaping devices and activity recognition for identification of the actual physical act of smoking.
The media industry is undergoing some serious transformations. Content volume is increasing, and consumers are watching video on a wide range of screens in various regions across the world. In order to grow and monetise their content, operators need smart tools for content classification. Leveraging recent advancements in ML and AI, operators can easily classify content in a media file based on elements such as sexually explicit scenes, violence, nudity, alcohol, smoking, and guns in order to comply with regional rules, regulations, and cultural sensitivities of the countries where content is served.